Robots and Biological Systems: 
Towards a New Bionics? 




NATO ASI Series 

Advanced Science Institutes Series 

A series presenting the results of activities sponsored by the NATO Science 
Committee , which aims at the dissemination of advanced scientific and 
technological knowledge, with a view to strengthening links between scientific 
communities. 

The Series is published by an international board of publishers in conjunction with 
the NATO Scientific Affairs Division 

A Life Sciences 
B Physics 

C Mathematical and 
Physical Sciences 
D Behavioural and 
Social Sciences 
E Applied Sciences 

F Computer and 
Systems Sciences 
G Ecological Sciences 
H Cell Biology 
I Global Environmental 
Change 

NATO-PCO DATABASE 

The electronic index to the NATO ASI Series provides full bibliographical 
references (with keywords and/or abstracts) to more than 30000 contributions 
from international scientists published in all sections of the NATO ASI Series. 
Access to the NATO-PCO DATABASE compiled by the NATO Publication 
Coordination Office is possible in two ways: 

- via online FILE 128 (NATO-PCO DATABASE) hosted by ESRIN, Via Galileo 
Galilei, 1-00044 Frascati, Italy. 

- via CD-ROM “NATO-PCO DATABASE” with user-friendly retrieval software 
in English, French and German (© WTV GmbH and DATAWARE 
Technologies Inc. 1989). 



Plenum Publishing Corporation 
London and New York 

Kluwer Academic Publishers 
Dordrecht, Boston and London 



Springer-Verlag 
Berlin Heidelberg New York 
London Paris Tokyo Hong Kong 
Barcelona Budapest 




Series F: Computer and Systems Sciences Vol. 102 




Robots and Biological Systems: 
Towards a New Bionics? 



Edited by 

Paolo Dario 

ARTS Lab, Scuola Superiore S. Anna 
56127 Pisa, Italy 

Giulio Sandini 

DIST, Universita degli Studi di Genova 
16145 Genova, Italy 

Patrick Aebischer 

Division de Recherche chirurgicale, Pav. 3 
CHUV, 1011 Lausanne, Switzerland 



Springer-Verlag Berlin Heidelberg GmbH 




Proceedings of the NATO Advanced Workshop on Robots and Biological Systems, 
held at II Ciocco, Toscana, Italy, June 26-30, 1989 



CR Subject Classification (1991): 1.2.9, J.3 



ISBN 978-3-642-63461-1 ISBN 978-3-642-58069-7 (eBook) 
DOI 10.1007/978-3-642-58069-7 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, 
specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on 
microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is 
permitted only under the provisions of the German Copyright Law of September 9, 1 965, in its current version, 
and permission for use must always be obtained from Springer- Verlag. Violations are liable for prosecution 
under the German Copyright Law. 

© Springer-Verlag Berlin Heidelberg 1993 

Originally published by Springer-Verlag Berlin Heidelberg New York in 1993 
Softcover reprint of the hardcover 1st edition 1993 
Typesetting: Camera ready by authors 
45/3140 - 5 4 3 2 1 0 - Printed on acid-free paper 




Preface 



Defined as a kind of applied cybernetics, bionics evolved in the 1960s as a framework 
to pursue the development of artificial systems based on the study of biological 
systems. The primary goal of bionics, as defined by one of its pioneers, Henning 
E. Von Gierke, was “to extend man's physical and intellectual capabilities by pros- 
thetic devices in the most general sense, and to replace man by automata and intelligent 
machines”. These objectives were pursued using models from the animal kingdom. For 
example, the ability of owls to swoop as quietly as they do, or the ability of beetles to 
create topographical maps of the terrain over which they fly, were examined. 
Numerous disciplines and technologies (some of which were still in their infancy), 
including artificial intelligence and learning devices, information processing, systems 
architecture and control, perception, sensory mechanisms, bioenergetics, etc., 
contributed to bionics research. 

Attempts to develop intelligent machines by uncovering the principles underlying 
nature's examples proved to be more difficult than expected. In fact, the lack of 
implementation of some of the proposed models of intelligent biological systems into 
intelligent machines highlighted both the inadequacy of theoretical models alone, and 
the limitations of available technology. 

This book contains revised papers originating from a NATO Advanced Research 
Workshop on Robots and Biological Systems, held in II Ciocco, Italy, in June 1989, 
which was attended by about 60 scientists from 12 different countries. The purpose of 
the workshop was to explore the relationship between biological systems and robotic 
systems, and to discuss the question of whether the attempt to replicate the skilled 
behavior of biological systems in artificial systems has a better chance to be successful 
now than 30 years ago. The workshop was sponsored by the NATO Scientific Affairs 
Division, within the special program on Sensory Systems for Robotic Control. 

The different perspectives proposed during the presentations stimulated many 
fruitful discussions and identified some strategic areas and future objectives for 
research on “new bionics”. 




VI 



A consensus emerged on the value of the concept of “learning from nature” in 
order to derive guidelines for the design of intelligent machines (even if not 
incorporating anthropomorphic features) which operate in unstructured environments. 

The significant progress in basic knowledge and technology that has occurred over 
the past 30 years has been recognized, and the need for continuous development in 
both areas has been indicated as a crucial factor for the fields of bionics and robotics. 
This research effort should be devoted on one hand to the understanding of the 
functions of biological systems, and on the other to the study of artificial sensory and 
motor subsystems and their coordination. Experimental robotics will play a crucial role 
in this respect. 

However, the goal of designing and fabricating robotic systems with capabilities 
comparable to those of superior animal species remains elusive. Participants agreed that 
current science and technology would allow for the development of machines 
incorporating insect-like intelligence. Far from being reductive, this objective is 
significant since an insect-like autonomous robot would rely on advanced mechanical 
structures, sensors, actuators, control mechanisms, energy sources, etc., in order to be 
capable of intelligent sensory-motor behavior. Moreover, the exercise of assembling 
such insect-like robots as physical models of biological systems would allow for a 
verification of underlying design principles, as well as for an extensive testing of basic 
technologies and sensory-motor integration. Finally, insect-like micro-machines could 
have a number of useful applications in the field of advanced robotics, e.g., in 
monitoring and inspections. 

In conclusion, the idea of revisiting the bionic approach to intelligent systems 
seems appropriate, especially in the light of present technological achievements and 
scientific knowledge. 

The book comprises seven parts, each reflecting the objective of comparing the 
state-of-the-art in critical areas of biological and machine intelligence. The first three 
parts of the book are devoted to discussing sensory-motor aspects of vision, prosthetic 
hands and tactile perception, and legged locomotion. The fourth part presents a 
systematic comparison of intelligent motor control in biological and artificial systems, 
and also includes novel design concepts for actuation mechanisms. The fifth and sixth 
parts cover some technological aspects related to sensors, actuators, and interfaces 
between artificial devices and the nervous system. The final part explores the problem 
of cooperation among multiple units, and the emergence of collective intelligent 
behavior. 




VII 



We wish to point out that the intensive work and fruitful discussions which were 
essential in achieving a successful workshop would have been impossible without the 
active and enthusiastic contribution of all participants. The environment at II Ciocco 
also played a significant role in creating a relaxed and constructive atmosphere. 

Ms. Lucia Lilli was very helpful in the editorial work associated with the assembly 
of this book. 



January 1993 

Paolo Dario 
Giulio Sandini 
Patrick Aebischer 




Table of Contents 



Part 1. Vision and Dynamic Systems 



Active Perception and Exploratory Robotics 3 

R. Bajcsy 

Object Identification and Search: Animate Vision Alternatives to Image 

Interpretation 21 

M.J. Swain, L.E. Wixson, DM. Ballard 

A Model of Human Feature Detection Based on Matched Filters 43 

M.C. Morrone, D.C. Burr 

Visualizing and Understanding Patterns of Brain Architecture 65 

A. Rojer, E.L. Schwartz 

Dynamic Vision 89 

H. Wechsler, L. Zimmerman 

A Model of the Acquisition of Object Representations in Human 3D 

Visual Recognition 99 

S. Edelman, D. Weinshall, H.H. Biilthoff, T. Poggio 



Part 2. Hands and Tactile Perception 

The Perception of Mechanical Stimuli Through the Skin of the Hand and 



Its Physiological Bases 121 

R.T. Verrillo, SJ. Bolanowski, Jr. 

Borrowing Some Ideas from Biological Manipulators to Design an 

Artificial One . . . 139 

V. Hayward 

Mechanical Design for Whole- Arm Manipulation 153 

W. T. Townsend, J.K. Salisbury 



Whole-Hand Manipulation: Design of an Articulated Hand Exploiting All 

Its Parts to Increase Dexterity 

G. Vassura, A. Bicchi 



165 




X 



Stable Grasping and Manipulation by a Multifinger Hand with the 

Capability of Compliance Control 179 

K. Tame, M. Kaneko 

Part 3. Locomotion 

Mobile Robots - the Lessons from Nature 193 

DJ. Todd 

Quadruped Walking Machine - Creation of the Model of Motion 207 

A. Morecki, T. Zielinska 

Biped Locomotion by FNS: Control Issues and an ANN Implementation .... 223 

G.F. Inbar 

How Fast Can a Legged Robot Run? 239 

J. Koechling, MH. Raibert 

Robot Biped Walking Stabilized with Trunk Motion 271 

A. Takanishi 

Part 4. Intelligent Motor Control 

A New Concept of the Role of Proprioceptive and Recurrent Inhibitory 

Feedback in Motor Control . 295 

U. Windhorst 

Analogic Models for Robot Programming 319 

R. Zaccaria, P. Morasso, G. Vercelli 

Structural Constraints and Computational Problems in Motor Control 339 

FA. Mussa-Ivaldi, E. Bizzi 

Motion Control in Intelligent Machines 361 

A. Meystel 

Control of Contact in Robots and Biological Systems 395 

N. Hogan 

Motor Control Simulation of Time Optimal Fast Movement in Man 411 

B. Hannaford, L. Stark 




XI 



Constraints on Underspecified Target Trajectories 419 

MJ. Jordan 

Proposal for a Patton Matching Task Controller for Sensor-Based 

Coordination of Robot Motions 445 

M. Brooks 

Sensory-Motor Mapping with a Sequential Network 457 

L. Massone, E. Bizzi 

Part 5. Design Technologies 

Flexible Robot Manipulators and Grippers: Relatives of Elephant Trunks 

and Squid Tentacles 475 

J. F. Wilson, D. Li, Z. Chen, R.T. George, Jr. 

Progress in the Design and Control of Pseudomuscular Linear Actuators .... 495 

G. Casalino, P. Chiarelli, D. De Rossi, G. Genuini, P. Morasso, M. Solari 

Shape Memory Alloy Linear Actuators for Tendon-Based Biomorphic 

Actuating Systems 507 

M. Bergamasco, F. Salsedo, P. Dario 

CCD Retina and Neural Net Processor 535 

AM. Chiang 

Retina-Like CCD Sensor for Active Vision 553 

G. Sandini, P. Dario, DM. De Micheli, M. Tistarelli 

Designing Artificial Structures from Biological Models 571 

M. Aizawa, H. Shinohara 

Design Strategies for Gas and Odour Sensors Which Mimic the 

Olfactory System 579 

K. C. Persaud, J. Bartlett, P. Pelosi 

Part 6. Interfacing Robots to Nervous System 

Multi-Electrode Stimulation of Myelinated Nerve Fibers 605 

PM. Veltink, JA. van Alsti, J. Holsheimer 




XII 



The Role of Materials in Designing Nerve Guidance Channels and 

Chronic Neural Interfaces 625 

R.F. Valentini, P. Aebischer 

Regeneration-Type Peripheral Nerve Interfaces for Direct Man/Machine 

Communication 637 

G.T.A. Kovacs, J.M. Rosen 

Integrated Bioelectronic Transducers 667 



M. Grattarola, A. Cambiaso, S. Cenderelli, S. Mcnrtinoia, M.T. Parodi, 
M. Tedesco 



Part 7. Robot Societies and Self-Organization 



A Robot Being 679 

R.A. Brooks, A.M. Flynn 

Swarm Intelligence in Cellular Robotic Systems 703 

G. Bern, J. Wang 

A Control Architecture for Cooperative Intelligent Robots . . 713 

J.S. Albus 

Cellular Robotics - Construction of Complicated Systems from Simple 

Functions 745 

T. Fukuda, Y. Kawauehi 



List of Participants 



783 




Part 1 

Vision and Dynamic Systems 





Active Perception and Exploratory Robotics 



Professor Ruzena Bajcsy 

Computer and Information Science Department 

University of Pennsylvania 

Philadelphia, Pennsylvania 19104 



1 Introduction 

Most past and present work in machine perception has involved extensive 
static analysis of passively sampled data. However, it should be axiomatic 
that perception is not passive, but active. Furthermore, most past and cur- 
rent robotics research use rather rigid assumptions, models about the world, 
objects and their relationships. It is not so difficult to see that these assump- 
tions, most of the time, in realistic situations do not hold, and hence, the 
robots do not perform to the designer’s expectations. 

Perceptual activity is exploratory, which implies probing and searching. We 
do not just see, we look. We do not only touch, we feel. And in the course, 
our pupils adjust to the level of illumination, our eyes bring the world into 
sharp focus, our eyes converge or diverge, we move our heads or change our 
position to get a better view of something, and sometimes we even put on 
spectacles. 

Similarly, our hands adjust to the size of the object, to the surface coarse- 
ness and to the hardness or compliance of the material. This adaptiveness is 
crucial for survival in an uncertain, and generally, unfriendly world as millenia 
of experiments with different perceptual organizations have clearly demon- 
strated. Although no adequate account or theory of activity of perception 
has been presented by machine perception research, very recently some re- 
searchers have recognized the value of actively probing the environment and 
emphasized the importance of data acquisition during perception including 
head/eye movement [3] [7]. 

Because of the realization of today’s inadequacies of robotic performances, 
we in the GRASP laboratory at the University of Pennsylvania for the past 
five years have embarked on research in Active Perception and Exploratory 
Robotics. What follows is an expose of our theoretical foundation and some 
preliminary results. First, we shall describe what we mean by Active Per- 
ception, then we shall argue that Perception must also include manipulation, 
and finally, we will present Exploratory Robotics as a paradigm for extracting 
physical properties from an unknown environment. 




4 



2 What is Active Perception? 

In the robotics and computer vision literature, the term “active sensor” gen- 
erally refers to a sensor that transmits (generally electromagnetic radiation, 
e.g., radar, sonar, ultrasound, microwaves and collimated light) into the en- 
vironment and receives and measures the reflected signals. We include under 
the term Active Perception, Active Sensing as well. We believe that the use 
of active sensors is not a necessary condition on active sensing, and that ac- 
tive sensing can be performed with passive sensors (that only receive, and 
do not emit, information), employed actively. Here we use the term active 
not to denote a time-of-flight type sensor, but to denote a passive sensor 
employed in an active fashion, purposefully changing the sensor’s state pa- 
rameters according to sensing strategies. Putting it more succinctly, we are 
introducing a new paradigm for research in Machine Perception [4,5] called 
Active Perception. The new ingredients of this paradigm are taking multiple 
measurements and their integration, and the inclusion of feedback. Hence the 
problem of Active Sensing can be stated as a problenj to control strategies 
applied to a data acquisition process that will depend on the current state 
of the data interpretation including recognition. The question may be asked: 
“Is Active Sensing only an application of Control Theory?” Our answer is: 
“No, at least in its simple version.” Here is why: 

The feedback is performed not only on sensory data but on complex pro- 
cessed sensory data, i.e., various extracted features, including relational fea- 
tures. We do not have complete descriptions of the states of the system. Fur- 
thermore the models that are used here are a mixture of numeric/parametric 
and symbolic information. 

But one can say that Active Perception is an application of intelligent con- 
trol theory which includes estimation, reasoning, decision making and con- 
trol [6]. This approach has been eloquently defended for Computer Vision by 
Tenenbaum [21]: “Because of the inherent limitation of a single image, the 
acquisition of information should be treated as an integral part of the percep- 
tual process... Accommodation attacks the fundamental limitation of image 
inadequacy rather than the secondary problems caused by it.” Although he 
uses the term “Accommodation” rather than “active sensing” , the message is 
the same. Before we can outline the problem of active sensing more formally, 
we need to spell out the assumptions under which we are making the design. 

The assumptions are that we have a priori available or we can extract: 

1. Models of sensors and all subsequent processing modules, i.e., physics 
and geometry of the modules, including noise and uncertainty consid- 
erations. 

2. The models of integration process of different modules, that is, combi- 
nation rules and feedback. 




5 



3. Explicit specification of the initial and final state/goal and of the task. 

If Active Perception is a theory, what is its predictive power? There are 
three components to our theory each with certain predictions: 

1. Models at each processing level are characterized by parameters. These 
parameters are estimated using estimation theory and determine the 
lower bounds of performance. 

2. The combination rules again predict the lower bounds of the final out- 
come from the system. 

3. The task model and the final state/goal specification guarantees the 
termination of the process and predicts the cost of accomplishing the 
task. 

2.1 The Models 

When we speak about models of sensors we are not restricted to the hard- 
ware only but also include various software modules that play a role in the 
processing chain. The following highlights of this work are worth mentioning. 

Sensory models: 

1. Physics models. These models represent the mathematical equations 
of principles that the sensors operate. The analysis of these models 
provides range for expected performance of the sensors if no other in- 
fluences than physics are at work. Examples of these models are optics, 
illumination, radiance, and forces. 

2. Geometric models. Here we get predictions from various aspects of 
geometry on the best possible values. An example is the geometry of a 
pair of stereo cameras predicts how resolution decreases as a function 
of distance [19]. 

3. Ideal Measurement or Signal models. These models will help us analyze 
and predict the feasibility of detection of certain features. Examples of 
this case are: edge (step, linear or non-linear) and region (piece- wise 
constant, or linear or nonlinear, but monotonic) models [9, 17]. 

4. Noise or Disturbance models. Here we have considered not only the 
abnormal distribution (as everyone else has) but also abnormal dis- 
tributions, symmetric or non-symmetric distributions of the random 
variables. 

All these models provide upper and lower bounds for expected errors, res- 
olution, and robustness, which is necessary for making certain decisions, in 
particular: “Do we need more data in order to get more accuracy? Can we 




6 



afford to take more data based on some economy? Given the errors how do 
we combine different pieces of information in order to improve the overall 
performance?” (For details, see Hager [11]) 

The Models and Estimation theory have been very successfully applied by 
Zucker in 1985 [23]. In this basic work titled “Theory of Early Orientation 
Selection” , Zucker used the model of a contour that comes from differential 
geometry. He divides the orientation selection process into three steps: 1) 
The measurement step-series of convolutions. 2) The interpretation step of 
these convolution values. (This is a functional minimization problem.) 3) 
Finding the integral curve through the vector field. This decomposition into 
steps, having the parameters of each step explicit, allows Zucker to make 
clear predictions about where the contours will or will not be found. We very 
much agree with Zucker’s criticism of the field for the lack of this kind of 
methodology! The very same flavor is in the paper of Leclerc and Zucker [14] 
where they study the edge detection of image discontinuities. The work of 
Binford and Nalwa [16] is again similar in flavor but applied to the modeling 
of edges or more general discontinuities. 

2.2 A Concrete Example 

A systematic and thorough approach to modeling, as it applies to Active 
Vision, is shown in the recent Ph.D. thesis of E. Krotkov [13] at the University 
of Pennsylvania. He has defined the task of determination of spatial layout 
using an agile camera system and two cues: range from focus and range 
from vergence. He has decomposed the problem into three subproblems: 1) 
Identifying an appropriate model M to represent the spatial layout of the 
environment; 2) finding effective methods for constructing M from vision 
data; and 3) determining strategies for actively, dynamically, and adaptively 
setting sensor parameters for acquiring the vision data. 

In this section, we shall review only the first subproblem. Krotkov modeled 
two characteristics of objects - extent and position - in the environment. This 
means encoding a map of location of objects with respect to the viewer. In 
order to accomplish the above, he had to model the details of the sensor (the 
camera) as well as the details of the computational process of obtaining range 
from focus and range from vergence. 

It is not possible to go into all the details of the analysis but we can sum- 
marize the model as follows: 

1. determine the optics of the lenses, the depth of the field, the accuracy of 
object distance, (in this setup the distance of the object is independent 
of the depth of field for distances 1-3 m.) 

2. circle of confusion; its diameter depends upon the distance of the object 
plane from focusing distance. For a given distance between the image 
and detector planes the confusion circle is directly proportional to the 
diameter of the aperture, in this case diameter is 58mm. 




7 



3. the spatial resolution of the detector array is another limiting factor; 
(for the CCD chip used in this work the width of one photoreceptor 
is 0.03 mm and the focal length f=105mm determines the evaluation 
window size, typically 20x20 pixels). 

4. determine how to measure the sharpness of focus with a criterion func- 
tion. After analyzing defocus as an attenuation of high spatial fre- 
quencies and experimentally comparing a number of possible criterion 
functions the method based on maximizing the magnitude of the inten- 
sity gradient was chosen. It proves superior to others in monotonicity 
about the mode and in robustness in the presence of noise. Then the 
Fibonacci search technique is employed to optimally locate the mode 
of the criterion function. 

5. the distance to an object point, given the focus motor position of 
sharpest focus is modeled by the thick lens law. 

All the above predictions were experimentally verified on more than 3,000 
points. 

A very similar exercise that can be presented, although will not be for 
lack of space, is the modeling of the physical relationships for the vergence 
controller and the modeling of the line finder that is being used for matching 
the two stereo pairs of lines. 

3 Perception Using Manipulation 

The motivation for this approach is the observation that it is impossible 
to discern movable and removable object/parts without manipulating them. 
This problem is rather broad though fundamental in Perception. In order to 
make some progress, we have limited ourselves to a subproblem which is how 
to decide that two objects are detachable [22]. We postulate that this cannot 
be decided only by vision, or in general, by any noncontact sensing. An 
exception to this is the case when the objects/parts are physically separated 
so that the noncontact sensor can measure this separation or one knows a 
great deal of a priori knowledge about the objects (their geometry, material, 
etc.). We assume no such knowledge is available. Instead we assume that 
the scene is reachable with a manipulator. Hence, the problem represents a 
class of problems of segmentation that occur on an assembly line, bin picking, 
organizing a desk top, etc. 

What are the typical properties of this class of problems? 

T. The objects are rigid. Their size and weight is such that they are 
manipulable with a suitable end effector. Their numbers on the scene 
are such that in a reasonable time each piece can be examined and 
manipulated, i.e., the complexity of the scene is bounded. 




8 




Figure 1: Objects to be segmented. 



2. The scene is accessible to the sensors, i.e., the whole scene is visible, 
although some parts may be occluded, and reachable by the manipula- 
tor. 

3. There is a well defined goal which is detectable by the available sensors. 
Specifically, the goal may be an empty scene, or an organized/ordered 
scene. 

The segmentation problem as is specified above is a subclass of a more 
general problem of a disassembly task that we wish to address in the future. 
As for any perceptual theory, the theory of segmentation using manipulation 
must have the following components: models of sensors, world/scene mod- 
els, task/utility models, and models of actions. The segmentation process 
is formulated in terms of graph-theoretic operations that are mapped into 
corresponding manipulatory actions. 

1) Models of sensors: these include the characterization of the non- 
contact sensor such as the spatial resolution, signal to noise ratio, the phys- 
ical parameters of the different end effectors, such as the vacuum succession 




9 




Figure 2: An example of a graph of a dispersed scene. 



cup, the size of the spatula for pushing objects, the span of the gripper, and 
the maximum allowable weight and/or force. 

2) Models of objects: specified in terms of their geometry, size and sub- 
stance. 

3) The Model of our world: this work is limited to arrangement of 
objects thrown at random on a plane, called a heap. Then a scene is a 
(partial) view of a heap. The objects in the scene are represented as nodes 
in the digraph and the arcs denote: on-top-of relation. It is important to 
emphasize that this digraph represents relations of only the visible surface 
segments, i.e., as they appear through the visual sensor which is not always 
the same as the physical objects and their surface segments. The true physical 
arrangements of objects on the scene as well as the part-whole relations of 
objects are not known. 

The scene can be classified based on the analysis of the digraph into the 
following categories: Empty, if there are no nodes in the graph; Dispersed, 
if there no arcs in the graph; Ambiguous, if there is a cycle in the graph; 
Overlapped, if there are at least two nodes connected with one arc in the 
graph; and Unstable - this category is not tested by the analysis of the graph 
but through analysis of the contact point/line of the object with the support 
plane. If this contact is point or line it is classified as unstable. Figure 2 
displays an example of a graph of a dispersed scene. 

4) Task models: The final goal of the process. An example of a final goal 
can be the empty scene and the intermediate goals then can be those scenes 




10 



that are more simply measured by a cost/benefit function. This cost/benefit 
function entails the cost of performing the particular manipulation, and the 
benefit is measured via the estimate of the outcome of the manipulation with 
respect to the final goal, i.e., emptying the scene. 

5) Models of Action: Parametrises the scene/object /manipulation inter- 
action. 

In principle there are two kinds of Actions: 1. Sensing action, i.e., data 
acquisition action (look and/or feel), and 2. Manipulatory action. 

The purpose of the manipulatory actions for this paper is to exhaust a 
physical disturbance, being either global (as shaking is) or local (a push- 
ing/pulling). In view of our formulation of the segmentation problem as a 
graph generation/decomposition problem we classify the manipulatory ac- 
tion in relationship to the operations that apply on the digraph. There are 
two such operations: the node removal, which means in terms of manipu- 
lation the removal of an object from the scene, and the arc removal which 
in turn translates into object displacement in the scene so that the relation- 
ship of on-top-of does not hold anymore between the two objects. Putting it 
another way: an isomorphism exists between the manipulation actions and 
graph decomposition operations. Our approach is to close the loop between 
sensing and manipulation. The manipulator is used to simplify the scene by 
decomposing the scene into visually simpler scenes. The manipulator carries 
the contact sensors to the region of interest and performs the necessary ex- 
ploratory movements that will determine the nature of the mechanical bind- 
ing between objects in the region. Perception-Action interaction is modeled 
by a non-deterministic finite state Turing Machine. The model of sensing, 
manipulation and control is a Non-deterministic Turing Machine (NDTM) as 
we show in Figure 3. The physical world (scene) is the “tape” of the machine, 
the “readJrom-tape” actions are the sensing actions and the “write to-tape” 
actions are the manipulation actions. The model is a Turing machine be- 
cause the manipulation actions constantly change the physical environment 
(tape) and hence its own input. The above model is non-deterministic be- 
cause of the non-predictable state of the scene after each manipulatory step. 
From this, of course, follows also the non-deterministic control of actions. 
In addition to the non-determinism of the control strategies, the automaton 
has finite states, which are determined by the finite numbers of recognizable 
scenes and the finite number of available actions. 

We believe that this model is quite general providing that one can quantize 
the scene descriptions and/or the sensory outputs into unique and mutually 
exclusive states, and, of course, one has only a finite number of manipulatory 
actions. 

There are several advantages to the formalisms of the non-deterministic fi- 
nite state Turing machine. The first advantage [1] is that the sense-compute- 
act formalism allows the control problem to be partitioned in time and com- 




11 




Current Implementation 
Future, Additional Sensors and Actions 

Figure 3: The Non-Deterministic Turing Machine (NDTM) 
























12 



plexity. At any given time, the system deals only with present state and 
present input, produces an output which is a function of current state and 
current input and moves to a new state. The current state encodes infor- 
mation about the past history of states and actions of the machine and its 
environment. Current sensory input is not deterministic (noise in sensory 
data). The next state of the NDTM is not deterministic because the ma- 
chine modifies its tape via actions whose outcome cannot be known a priori 
(push and shake actions). 

The second advantage is that the theoretical tools needed to prove correct- 
ness of the machine’s behavior have long been established and tested. Path 
sensitization and graph de-cyclization algorithms exist, [10, 12, 8] to prove: 
1) The goal state is reachable and 2) The state transition diagram does not 
contain deadlock states, or cycles. 

The third advantage is that it facilitates error handling. If additional states 
need to be defined to deal with non-anticipated error conditions, then these 
states can be simply inserted. 

The fourth advantage is that it is modular and allows insertion of new 
sensors, actions and feedback conditions. 

The fifth advantage is that it makes debugging easy. 

The sixth advantage is that it allows a system to be developed incremen- 
tally. 

One disadvantage is that the number of states and transitions needed to 
represent the machine and its environment increases as more sensors are 
added. Addition of more sensors implies increased complexity. 



4 Exploratory Robotics 

Much of the work in Robotics until now has been by and large conducted in 
the so-called “knowledge driven” framework. The justification for this ap- 
proach was the fact that in the industrial environment the geometry, material, 
environmental conditions and the task are (1) quite constrained, (2) known 
a priori, and (3) well controllable. However, this is not the case in many 
other situations and applications of robots in underwater, mine and outer 
space explorations. The common denominator to all of these cases is that 
the robot must be able to explore and adapt to unconstrained and unknown 
environments. This is the motivation for the investigation of Exploratory 
Robotics. 

4.1 Definition of the Problem 

We wish to investigate the necessary components/modules that must be em- 
bedded into a robot with Exploratory Capabilities. These ideas came from 
our collaborative efforts between R. Klatzky and S. Lederman, see [15]. In 




13 



other words, what sensors, exploratory procedures, data processing, data re- 
duction and interpretation capabilities for a given task must such a robot 
have. In full generality, this task is formidable. Hence, we shall limit our- 
selves to two more specific tasks: 1) Exploration of surface properties of 
ground for mobility purposes, and 2) Exploration of an object for manipula- 
tory purposes. 

In the first task, we shall consider surfaces made from materials such as 
dirt /soil/sand, rocks/concrete, pebbles/gravel, metals, wood, glass/ceramics, 
rubber/polymers, and viscous mixtures (like mud). We shall not consider 
vegetables, textiles, liquids, and like materials. 

In the second task we will limit ourselves to objects by size and weight. This 
limitation will be determined by the size and flexibility of the end-effector, 
i.e., we shall consider objects that are graspable. This will exclude liquid, for 
example, but not deformable objects like a cable or a rubber ball. We shall 
also investigate objects that have two rigid parts joined by a hinge. 

For both of the tasks, the robot will be equipped with one six-degree free- 
dom manipulator and a range finder and/or a pair of CCD cameras, called 
the LOOKER , and ONE six-degree freedom manipulator and a hand, called 
the FEELER. The LOOKER , depending on the need, can also have a color 
camera system or any non-contact electromagnetic wave measuring detector 
(infrared is one possibility). The FEELER has a force/torque sensor in its 
wrist and a hand with three fingers and a rigid palm. Each finger has one 
and one-half degrees of freedom. 

The sensors on the hand include: a position encoder and force sensor at 
each joint of the finger; a tactile array at each of the finger tips and on the 
palm; a thermo-sensor on the palm, and ultrasound sensor on the outer side 
of the hand. In addition the hand has available various tools that it can pick 
up under its control. 

Both the FEELER and LOOKER are under software control of strategies 
for data acquisition and manipulation. For the first task, we consider a model 
of a foot with a planar sole as one tool that will act as the probe for testing 
the surfaces for mobility. 

4.2 Exploration of Surface Properties 

The Problem 

Given a surface, we wish to establish procedures to determine physical and 
geometric properties with minimal a priori information so that an object like 
a robot or vehicle can move on this surface. The basic assumption is that the 
surface is much larger than the robot and, at least, locally flat so that there 
is space to move around. The flatness assumption is relative to the size of 
the robot: the surface variation from a planar surface must be no more than 
10% (3dB) of the sole of the robot’s foot. We do not consider the problem of 
obstacles. 




14 




Figure 4: The FEELER and LOOKER. 



Scientific fields older than robotics have investigated how to measure the 
attributes of the materials listed in Table 1. They are mineralogy, geology, 
soil science, civil engineering (for testing soil in preparation for building), 
and material sciences in general. Tests from these fields share the following 
procedures: (1) take samples into the laboratory and perform a multitude 
of tests, (2) if necessary, perform destructive tests, such as for brittleness 
or penetrability, or even for deformability, and (3) perform excavations of 
layered surface (as in geology). 

The question for this research is which of these procedures are applicable 
for our domain. The procedure in (1) can be applied in the robotic context. 
One can design a robot in such a way that it can carry with it a small testing 
kit. The point in (2) is harder to envision though as part of the calibration 
process can be executed. The last point, (3), is totally inapplicable since the 
robot will not have time to perform excavation before it moves. 

We examine those Exploratory Procedures (EPs) which will allow the robot 
to: a) stand firmly on the surface (static stability) and b) move on the surface 
in a stable manner (dynamic stability). 




15 



rri to 

52 ® 
3 M 
O 3 

U +3 

2 X 

|>S 



cn 

a 

H 

a 

3 

Eh 

Eh 

2, 

a 

M 

a 

◄ 

cn 



a 

W 

Eh 

Q 

2 

◄ 

cn 

a 

o 

s 

a 

5 



CO 

<D ^3 

\2 S 

Uj M 



L- , t* 



co 



o 



CO 



w 

M 

<D 

s 

o 

fin 



P <$ 

M 

<D 

o 



p 



<D 

|J3 2 

S O 

3 5 
*6 



ca 

jS 



o 

55 



pS 



ca 

jS 



co 

jS 



* 



CO 

£ 



O 

£ 



o 

55 



<a 

£ 



<a 

i£ 



o 

55 



o 

55 



o 

55 



!S 



co 

£ 



o 

55 



o 

:55 



o 

55 



o 

55 



co 

* 



(S 



ca 



o 

55 



I* 



<a 

!S 



o 

55 



o 

Ifc 



ca 

!S 



o 

£ 



w 

H 

P 

S 

2 

H 

3 



<D 

> 

’55 

ca rH 

<D *5 

I 2 

o 35 



ca 

co 

(D 

Ph 

o A 
«3 b0 
M S 

5 O 
Ico Ph 



cc3 o 

M 

-S g 

Eh O 



_ _> 
<3 V»j 
o o 

•c s 

-4j T3 

O fH 

JH o 

a u 



g 2 

b0 S 

* S3 

S a 



Table 1: Materials and their salient attributes 



Yes - Attribute exists, is measurable and is a distinguishing property. 

No - Attribute does not exist or is not measurable or is not a distinguishing 
property. 










































16 



Further Assumptions 

In order to further constrain the interpretation of the measurements we elim- 
inate the effect of the geometry, that is, we assume that both the LOOKER 
and the FEELER are perpendicular to the examined surface. 

Exploration for static stability: Exploratory Procedure for surface firmness 
versus penetrability. The penetrable surface can be deformable, compressible, 
either, or both. As an example, whereas penetrable objects such as soil, sand, 
pebbles, viscous mud, and rubber/polymers are deformable; only soil, sand, 
and pebbles are compressible (see Table 1). 

This EP will utilize a cooperative effort between vision and force guided 
penetration. The FEELER exerts controlled and recorded force on the surface 
while the LOOKER observes the surface. If the surface has not changed under 
the given force, then it is firm; if it deforms then it is penetrable. It can be 
either deformable or compressible. The test for discriminating the latter two 
is to use the LOOKER observation of the resulting surface after the FEELER 
has withdrawn the penetrating force. If the surface has not changed from the 
previous image then we have a deformable surface (just like mud would stay); 
otherwise we have a compressible surface. Naturally, this is not a sufficient 
test, especially when the measurement indicates no firm surface. Other tests 
like measure of pressure, surface roughness and viscosity must be carried 
out. Which ones are necessary and sufficient will be one of the topics of this 
research. 

4.3 Exploration of Graspable Objects 

The Problem 

We wish to find the following properties of the graspable object: material (its 
hardness and surface texture), density, temperature, weight and size, rigidity 
versus flexibility, and finally gross shape for identifying graspable points. 

In order to accomplish this task one needs two modes of exploration: a 
Static mode and a Dynamic mode. In the Static mode the object is station- 
ary and the LOOKER and the FEELER can look and feel around the object. 
During the Dynamic mode the object is being grasped and manipulated, for 
example lifted or shaken. In the Static mode we can establish the following 
attributes: size, shape, temperature and hardness/surface texture. In the 
dynamic mode the remaining attributes are established: the weight, density 
and the rigidity versus flexibility. 

The Static Exploratory Procedures applied on objects 

Following the work of Allen [2] and Stansfield [20] we accept their findings 
that blind touch is unproductive and the tactile exploration should be guided 




17 



by vision. Hence we begin with the LOOKER which will give us the posi- 
tion, gross shape and size of the object. Using the superquadric fitting to 
the visual three-dimensional data developed by Solina [18], we get further 
parametrization of the data, that is: the orientation, extent in three orthog- 
onal planes (the size), and estimate of the surfaces (whether they are planar 
or second order surfaces) of the object. Then following Stansfield’s EPs for 
hardness and surface texture and using the FEELER we can estimate the 
material of the object. In addition, by measuring the conductivity of the 
material (by another similar low level EP), we can further distinguish the 
material as metal or non-metal. All these properties are passed to the next 
stage - the Dynamic mode. 

The Dynamic Exploratory Procedure applied on objects 

As mentioned before, the dynamic EPs will measure weight, density and 
rigidity. EPs for weight and density: Grasp the object and lift it to a height 
H. The exerted force divided by approximately .9 (gravitational force) will 
give the weight of the object. The weight divided by the volume (calculated 
from the shape parameters) is the density of the material. 

The more sophisticated Exploratory Procedure is the test for rigidity. An- 
other assumption: consider objects either rigid, bent, or as two parts con- 
nected with a hinge. This again involves a cooperation between the LOOKER 
(the vision) and the FEELER (with force-guided probe). There are several 
strategies that must be followed in a few specified orders: 

1. Consider an object which is being translated or rotated on the table by 
pushing (we know the magnitude and direction of the exerted force). 
If the new image can be accounted for by rigid transformations for this 
manipulation, then the object is rigid; otherwise the change must be 
examined. 

2. Examination of the change: parts are rigid but their spatial relationship 
has changed, or the object is bent, i.e., a deformation has occurred. 

3. The case of rigid parts indicates that there is either one fixed point of 
rotation, or one fixed line of rotation. In either case we have identified 
a hinged object. 

4. In the case of a bent object, compute the amount of bend. 

5 Conclusion 

We have defined Active Perception as a problem of an intelligent data acquisi- 
tion process. For this, one needs to define and measure parameters and errors 
from the scene which in turn can be fed back to control the data acquisition 
process. This is a difficult though important problem. Why? The difficulty 




18 



is that many of the feedback parameters are context- and scene-dependent. 
The precise definition of these parameters depends on a thorough under- 
standing of the data acquisition devices (camera parameters, illumination 
and reflectance parameters), algorithms (edge detectors, region growers, 3D 
recovery procedures) as well as the goal of the visual processing. The im- 
portance, however, of this understanding is that one does not spend time on 
processing and artificially improving imperfect data but rather on accepting 
imperfect, noisy data as a matter of fact and incorporating it into the overall 
processing strategy. 

The second point we made is that manipulation is an essential part of the 
perceptual process. The hand is as the eye: a sensory device. Subsequently, 
one needs to consider not only signal processing modules but also basic ma- 
nipulatory action called exploratory procedures as an essential ingredient of 
perceptual theory. 

The third and last point we are making is a case for Exploratory Robotics. 
Today, it is assumed that the size and shape of the object is sufficient for 
grasping purposes. It should be very apparent that unless one knows what 
materials are being used the system may be easily fooled. And even if we 
know the material of the outer surface, we do not know the inside, which 
may very dramatically change the weight, and hence, the grasping strategy. 
Our research aims to fill this gap. The question of rigidity is also very crucial 
when a grasping strategy is considered. Furthermore, the tests for hinges 
and bending are the first tests towards testing the functionality of an object. 
In the test for rigidity, we need to further explore what changes will occur 
when other controlled manipulatory actions will be applied on such objects, 
for example, lifting or rotating the object in space. All these steps are part 
of a general examination of the object, finding stable positions, etc. All these 
tests lead to understanding of what the necessary components are for a gen- 
eral purpose Perceptual Theory. 

A ckno wledgement s 

This research was funded in part by the United States Postal Service, 
BOA Contract: 104230-87-H-0001/M-0195, DARPA/ONR grant N0014-85- 
K-0807, NSF grant DCR 8410771, Air Force grant AFOSR F49620-85-K- 
0018, Army grant DAAG-29-84-K-0061, by DEC Corporation, IBM Corpo- 
ration, and LORD Corporation. 



References 

[1] Albus, J., Barbera, A. & Fitzgerald, M. (1982) Programming a Hier- 
archical Robot Control System , Proceedings of the 12th International 
Conference on Industrial Robots, Paris, France. 




19 



[2] Allen, P.K. (1987) Robotic Object Recognition Using Vision and Touch. 
Kluwer, Norwell, MA. 

[3] Aloimonos, J. k Badyopadhyay, A. (1987) Active Vision , Proceedings 
of the 1st IEEE International Conference on Computer Vision, London, 
U.K.: Computer Society Press, pp. 35-54. 

[4] Bajcsy, R. (1985) Active Perception vs. Passive Perception , Proceedings 
of the 3rd Workshop on Computer Vision: Representation and Control 
Bellaire, MI., Computer Society Press, pp. 55-59. 

[5] Bajcsy, R. (1988) Active Perception , Proceedings of the IEEE, 76, pp. 
996-1005. 

[6] Bajcsy, R., Gupta, A. k Mintz, M. (1989) Research on Symbolic Infer- 
ence in Computation Vision , Technical Report, University of Pennsyl- 
vania. 

[7] Ballard, D.H (1987) Eye Movements and Spatial Cognition , Computer 
Science Dept. Univ. of Rochester, TR 21, November 1987. 

[8] Deo, N. (1974) Graph Theory with Applications to Engineering and Com- 
puter Science ) Prentice-Hall, Englewood Cliffs, NJ 07632. 

[9] Haralick, R.M. (1981) The Digital Edge ) Proceedings of IEEE Computer 
Conference Pattern Recognition and Image Processing, Computer Soci- 
ety Press, pp. 285-291. 

[10] Hartmanis, J. k Stearns, R. (1966) Algebraic Structure Theory of Se- 
quential Machines , Prentice- Hall, Englewood Cliffs, NJ 07632. 

[11] Hager, G.D. (1988) Active Reduction of Uncertainty in Multi-Sensor 
Systems , Ph.D. Dissertation, Computer and Information Science De- 
partment, University of Pennsylvania, Philadelphia, PA 19104. 

[12] Kohavi, Z. (1970) Switching and Finite Automata Theory , McGraw-Hill, 
New York. 

[13] Krotkov, E. (1987) Exploratory Visual Sensing for Determining Spa- 
tial Layout with and Agile Camera System , Ph.D. Dissertation, Com- 
puter and Information Science Department, University of Pennsylvania, 
Philadelphia, PA 19104. 

[14] Leclerc, Y.G. k Zucker, S.W. (1989) The Local Structure of Image Dis- 
continuities in One Dimension ) IEEE Transactions on PAMI, Vol. 9, 
No. 3, pp. 341-355 

[15] Lederman, S.J. k Klatzky, R.L. (1987) Hand Movements: A Window 
into Haptic Object Recognition , Cognitive Psychology, 19, pp. 342-368. 




20 



[16] Nalwa, V.S. k Binford, T.O. (1986). On Detecting Edges. IEEE Trans- 
actions on PAMI, Vol. 8. pp. 699-714. 

[17] Pavlidis, R and Liou, Y.T. (1988) Integrating Region Growing and Edge 
Detection , submitted to ICVPR. 

[18] Solina, F.(1987), Shape Recovery and Segmentation with Deformable 
Part Model , Ph.D. Dissertation, Computer and Information Science De- 
partment, University of Pennsylvania, Philadelphia, PA 19104. 

[19] Solina, Franc (1985) Errors in Stereo Due to Quantization ) GRASP lab. 
memo, Univ. of Pennsylvania, Philadelphia, December 1985. 

[20] Stansfield, S.A. (1988) A Robotics Perceptual System Utilizing Passive 
Vision and Active Touch } International Journal of Robotics Research, 7, 
pp. 138-161. 

[21] Tenenbaum, J. M. (1970) Accommodation in Computer Vision Stanford 
University Ph.D. Thesis. 

[22] Tsikos, C. (1987) Segmentation of 3-D Scenes Using Multi-Modal Inter- 
action Between Machine Vision and Programmable , Mechanical Scene 
Manipulation ) Ph.D. Dissertation, Department of Computer and Infor- 
mation Science, University of Pennsylvania, Philadelphia, PA 19104. 

[23] Zucker, S.W. (1985) Early Orientation Selection: Tangent Fields and 
the dimensionality of their support ) TR-85-13-R, Computer Vision k 
Robotics Laboratory, Dept, of EE, McGill Univ. Montreal, Quebec, 
Canada. 




Object Identification and Search: Animate Vision 
Alternatives to Image Interpretation 

Michael J. Swain, Lambert E. Wixson, and Dana H. Ballard 

Computer Science Department 
University of Rochester 
Rochester, NY 14627, USA 

s wain@cs . rochester . edu 

Abstract. We are accustomed to thinking of the task of vision as being the construction of 
a detailed representation of the physical world. However, a paradigm that we term animate 
vision argues that vision is more readily understood in the context of the tasks that the sys- 
tem is engaged in, and that these tasks may not require elaborate categorical representations 
of the 3-D world. As an example, we show how the general problem of image interpretation 
can be replaced in many cases by a combination of two simpler problems, identification and 
search. Both tasks use multidimensional color histograms to represent the model and images. 
Color histograms are shown to permit efficent matching and a sufficiently rich representation 
to distinguish among a large number of objects. 

1 Introduction 

We are accustomed to thinking of the task of vision as being the construction of a detailed 
representation of the physical world that can be used in the execution of various robotic 
tasks. Often this detailed representation is assumed to be one in which every object has 
been identified; this is the goal of a process commonly referred to as image interpretation. In 
contrast to this view, however, is a paradigm that we term animate vision. Animate vision 
argues that the representation needed for a given task is often considerably simpler than the 
elaborate categorical representation that is the goal of image interpretation. In many cases, 
the reason that a simpler representation suffices is related to possession of a mobile camera 
platform [Ballard, 1989]. This paper argues that while image interpretation has proven to 
be extremely difficult and computationally complex, there exist two easier tasks, search and 




22 



identification, that can often achieve goals that in a traditional view would be handled by 
an image interpretation system. Novel implementations of these tasks are presented which, 
to achieve fast execution times, rely heavily on color rather than shape to perform object 
recognition. 

2 Vision as a Collection of Task-Specific Processes 

One motivation for the animate vision view of vision as a collection of task-specific processes 
comes from the major functional divisions found in human and primate brains. A significant 
feature of the gross organization of the primate visual brain is the specialization of the 
temporal and parietal lobes of visual cortex [Mishkin and Appenzeller, 1987; Maunsell and 
Newsome, 1987]. The parietal cortex seems to be subserving the management of locations in 
space whereas the temporal cortex seems to be subserving the identification of objects in the 
case where location is not the issue. In a striking experiment by Mishkin [1987], monkeys 
with parietal lesions fail at a task that requires using a relational cue but have no trouble 
performing a very similar task that requires using a pattern cue. The reverse is true for 
temporal lesions. Why should the primate brain be specialized in this way? If we think 
generally about the problem of relating internal models to objects in the world, then one 
way to interpret this “What /Where” dichotomy is as a suggestion that image interpretation, 
the general problem of associating many models to many parts of the image simultaneously, 
is too hard. In order to build vision systems that axe computationally tractable within a 
single fixation, i.e. that function in real-time, perhaps the problem must be simplified. 

The goal of image interpretation is to identify and compute the pose of the objects in the 
image and from this data to construct a world model for future reference. When we consider 
applying a What/Where simplification, a decomposition into two tasks suggests itself. The 
first, object identification , attempts to identify a given portion of an image. The image 
portion is assumed to come from some sort of perceptual grouping, such as a segmentation 
arising from depth, motion, or intensity cues. The second task, object search , given a certain 
type of object to search for, attempts to search for an object either within an image or, 
more generally, within the world (by using a mobile camera like that described by Brown 




23 





Object to Match Against 
One Many 


Image One 

Portions 

Many 


Manipulation: trying to do 
something with an object 
whose identity and location 
are known 


Identification: trying to 

identify an object whose lo- 
cation can be fixated 


Search: trying to find a 

known object that may not 
be in view 


Image interpretation: too 

hard? 



Table 1: The biological organization of cortex into What/ Where modules may have a basis 
in computational complexity. Trying to match a large number of image segments to a large 
number of models at once may be too difficult. 



[1988]). Table 1 summarizes the visual tasks that arise from the possible combinations of 
the What /Where dichotomy. 

A robot equipped with these two simpler capabilities does not need to resort to image 
interpretation for most applications. To see this, let us consider four common reasons why 
a robot might “decide” that the image produced by its camera(s) should be run through an 
image interpreter. One reason might be that something has attracted the robot’s interest 
to the image. Since it has not yet identified the objects in the image, this “attraction” 
must stem from some more primitive perceptual property of the scene, such as a particular 
texture, a certain arrangement of lines or surfaces, or the presence of motion. It is almost 
always the case, however, that in such cases the only object that it is really important to 
recognize is the object that gave rise to the interesting perceptual property. Since we know 
the portion of the image that gave rise to this property, an object recognition system only 
needs to be executed on this portion, not on the entire image as with image interpretation. 
This is exactly the role of the object identification process described above. 

It might be argued that this rationale may not always work. For example, perhaps nearby 
motion attracted our interest and we identified the moving object as a bird, but what we are 
most concerned with discovering is whether the bird was startled by an approaching tiger. 
In this case, we can still avoid interpreting the entire image by using another member of our 
repertoire of visual processes, object search. This capability allows us to search within the 
image or to deploy our camera with the explicit purpose of finding a tiger. 











24 



A third reason that an image interpretation module might be used is that in many cases 
an ambiguous object can be identified better if neighboring objects are also being identified 
at the same time and if the labels attached to these neighbors can provide “context” . This 
approach is often formulated within paradigms which involve evidence weighting, constraint 
satisfaction, or energy minimization. An example of this might be attempting to identify an 
object which the system thinks is either a workstation or a terminal. In this case, the presence 
of a mouse near the object would provide strong evidence for the workstation interpretation. 
We hypothesize, however, that most of the benefits of this sort of contextual evidence can 
be easily produced by using the object search mechanism. A simple method might be to 
attach information about supporting objects to each object model. Thus, the workstation 
model might have an “annotation” which states that a mouse provides good evidence for this 
model. When the object identification process results in ambiguity, the search mechanism 
would set out to look for objects which can resolve the ambiguity. Thus, in our example, 
the search mechanism might search for a mouse in the area near the object. 

Finally, probably the most prevalent reason for using an image interpreter to construct a 
detailed map is to remember the locations of objects for future reference. There are several 
problems with this view, however. The world may change, causing the representation to 
become invalid. A more serious problem, however, lies with the choice of the coordinates 
with which to denote the locations of the object in the world model; world coordinate systems 
are often impractical due to sensor and effector error [Brooks, 1987]. A fast object search 
process, however, obviates the need for such record keeping. Instead of remembering the 
exact location of the objects you have seen, simply search for objects as you need them. 
Ballard [1989], examining fixation traces of subjects instructed to remember the position of 
objects in a room, has conjectured that humans do not construct a detailed world model 
routinely. It is tempting to speculate that humans compensate by using their object search 
abilities. 

The above four case studies demonstrate how two animate vision processes, object identi- 
fication and object search, can achieve the same purpose as image interpretation. Note that 
we are certainly not claiming that all applications can avoid image interpretation. Robot 
applications such as mapping, whose goal is to construct a detailed world model, may re- 
quire this capability. Our claim is simply that most robot applications do not require a 




25 



world model in which the individual objects are identified and their locations stored. In 
these situations, image interpretation is overkill. 

In the remainder of the paper, we present two simple but novel methods for identification 
and search. Since both of these algorithms rely on color extensively, we turn first to a 
discussion of color, color spaces, and color histograms. 

3 The Role of Color in Visual Processes 

Color has been neglected recently as an identification cue, although it has been used in earlier 
work [Feldman and Yakimovsky, 1974; Garvey, 1976; Ohlander et a/., 1978], One reason for 
this may have been the lack of good algorithms for color constancy. However, recently there 
has been great progress in correcting for both the chromaticity of the illuminant [Maloney and 
Wandell, 1986; Forsyth, 1988; Rubner and Schulten, 1989; Hurlbert and Poggio, 1987] and 
for geometric effects such as specularity [Klinker et al y 1988; Bajcsy et a/., 1989; Healey and 
Binford, 1987]. Given that reasonable color constancy can be achieved, color has enormous 
value in vision because it is a local surface property that is view invariant and largely 
independent of resolution. Shape cues, by contrast, are highly resolution dependent, and 
only a highly restricted set are view invariant (e.g. corners, zeros of curvature). 

Perhaps another reason that color has not been used is that it is not intrinsically related to 
the object’s identity in the way that other cues, e.g., form, are. This view is well represented 
by Biederman [1985]: 

“Surface characteristics such as color and texture will typically have only sec- 
ondary roles in primal access ... we may know that a chair has a particular 
color and texture simultaneously with its volumetric description, but it is only 
the volumetric description that provides efficient access to the representation of 

CHAIR.” 

However, this opinion is easily challenged. There are many examples from nature where 
color is used by animals and plants to send clear messages of enticement or warning. The 
manufacturing sector uses color extensively in packaging to market goods. Animate vision 




26 



systems can also use representations that are heavily personalized to achieve efficient behav- 
iors. For example, it may not be helpful to model coffee cups as being red and white, but 
mine is, and that color combination is very useful in locating it. 

One way to take advantage of the view invariance of color is to use a color histogram. 
Given a discrete color space defined by some color axes (e.g. red, green, blue), the color 
histogram is obtained by counting the number of times each color occurs in the image array. 
To illustrate, Figure 1 shows the three chromatic channels from a color camera together with 
a color histogram obtained from the images. In the example a set of opponent color axes are 
used, which are transformations of the red, green and blue axes [Ballard and Brown, 1982], 
In a vision system that deals with large variations in lighting, the axes could be the basis 
functions for color constancy described in [Maloney and Wandell, 1986]. 

Histograms are invariant to translation and rotation about an axis perpendicular to 
the image plane, and change only slowly under change of angle of view, change in scale 
and occlusion. Figure 2 shows the variation in match value, which measures the fraction of 
pixels that fall into the same histogram buckets, between the model histogram and the image 
histograms as the view angle and distance are changed. Because histograms change slowly 
with view, a three-dimensional object can be adequately represented by a small number 
of histograms, corresponding to the object’s principal views [Feldman, 1985]. Histograms 
are also efficient to compute. Generating a histogram from a 512 x 485 image takes about 
40 milliseconds using MaxVideo image processing hardware, including the time needed to 
transfer the histogram to the host. 

Both the object identification and object search implementations described in the follow- 
ing sections identify objects by matching image histograms to model histograms. 

4 Object Identification 

This section presents a method of matching image histograms to model histograms so as to 
efficiently recognize an object that appears in a real-world scene. Because the model database 
may be large, we can only afford a highly restricted amount of processing per model, but 
at the same time we must be able to overcome the problems that hinder recognition, most 




27 




Figure 1: Clockwise from upper right: Red, green, and blue bands of “Arm & Hammer” 
image. The main body is yellow, the circle containing the hammer is red, the stripe at the 
top is green, and the lettering and hammer are blue and white. Upper left: Two-dimensional 
opponent color histogram of “Arm & Hammer” image, 16 buckets along each axis. Red-Green 
axis runs vertically, green at the top, red at the bottom. Blue- Yellow axis runs horizontally, 
yellow at the left, blue at the right. The yellow (far left) peak and black background (the 
center) peaks are the largest, and red and green peaks, as well as a small blue peak, are 
present (from [Wixson and Ballard 1989]). Three-dimensional opponent color histograms, 
used in the object identification section of this paper, have a third White-Black axis. 

1 

0.8 
0.6 
0.4 
0.2 



-40 -20 0 20 40 -40 -20 0 20 40 100 120 140 160 

foot - Rotation (deg) - head left - Rotation (deg) - right Distance (cm) 

Figure 2: Variation of the Histogram Intersection match value as the camera is moved with 
respect to a Snoopy doll. In the Distance graph the model image was taken at a distance of 
124 cm. The match value changes slowly with changes in angle and distance. 









28 



importantly 

• distractions in the background of the object, 

• viewing the object from a variety of viewpoints, 

• occlusion, 

• varying lighting conditions. 

The matching method proposed here, called Histogram Intersection, is robust to the first 
three problems; the last is left to a color const ancy module that operates on the input prior 
to the histogram stage. It is also extremely efficient and easy to implement. Two 16x16x8 
histograms can be matched in 2 milliseconds on a SUN Sparcstation 1 (a 12 MIP RISC 
machine). 

4.1 Description 

Given a pair of histograms, I and M , each containing n buckets, the intersection of the 
histograms is defined to be 

i = l 

The result of the intersection of a model histogram with an image histogram is the number 
of pixels from the model that have corresponding pixels of the same color in the image. To 
obtain a fractional match value between 0 and 1 the intersection is normalized by the number 
of pixels in the model histogram. The match value is then 

E? =1 min( 

£"=i M i ' 

The Histogram Intersection match value is not reduced by distracting pixels in the back- 
ground. This is the desired behavior since complete segmentation of the object from the 
background cannot be guaranteed . Segmentation is still a topic of active research, but the 
indications from the large amount of research done in the past are that segmentation is likely 
to continue to be a difficult problem that is computationally intensive and subject to failure 




29 



in many situations. The histogram intersection match value is only added to by a pixel in 
the background if 

• the pixel has the same color as one of the colors in the model, and 

• the number of pixels of that color in the object is less than the number of pixels of 
that color in the model. 

Figure 2 shows the dependence of Histogram Intersection on viewpoint and occlusion. 
There axe a number of ways of determining the approximate depth of an object, from laser 
or sonar range finders, disparity, focus or touching the object with a sensor. The depth 
value combined with the known size of the object can be used to scale the model histogram. 
Alternatively, if it is possible to segment the object from the background and it is not 
significantly occluded, the image histogram can be scaled to be the same size as the model 
histogram. 

4.2 Representing a Large Database 

Both theoretical and experimental examinations of histogram intersection demonstrate that 
the technique is suitable for indexing into a large database. 

If objects are uniformly distributed in color space and all histograms are of the same 
(maximum) size, then the number of models that can be represented is at least 

1 

(2 6)*- 1 

where 1 — S is the minimum match value allowed and n is the number of buckets in a 
histogram [Swain and Ballard, 1991]. For any 6 significantly smaller than \ this number is 
huge for size of n used in the implementation. For instance, if 8 — 0.48 and n = 512 (the 
number of buckets in an 8x8x8 histogram) then the number that can be stored is about 10 9 . 
Smaller histograms match all the histograms of which they are a subset and therefore reduce 
the total number of objects that can be distinguished. 

Experiments support the hypothesis that a large number of objects can be represented. 
For the 66 object database shown in Figures 3-5, the correct model is the best match 90% 




30 



of the time and is always one of the top two matches. Other, more expensive, matching 
techniques can be used to verify which of the top scoring models is the correct one, so it is 
not crucial that the correct model is always the best match. Notice that the histograms axe 
not all the same size and that the objects are not uniformly distributed throughout color 
space (for instance, white is much more predominant than in a uniform distribution). 

4.3 Efficient Indexing into a Large Database 

Histogram Intersection is efficient compaxed to most recognition schemes. Nevertheless, for 
large databases the linear dependence of the recognition scheme on database size will add up. 
Parallel processing is one way of attacking this problem, since the match over different models 
is easily paxallelizable. Another way, which reduces the recognition complexity to constant 
time for a broad range of databases, is shown below. In this scheme, called Incremental 
Intersection , only the largest buckets from the image and model histograms are compared, 
and a partial histogram intersection value is computed. The computation is incremental, so 
that the algorithm can be interrupted at any time with good results. This last feature could 
prove to be extremely important in a system that interacts with the real world, in which the 
times that actions axe taken are often dictated by outside events. 

Incremental Intersection is split into two phases, an off-line phase in which the data 
structure representing the database is generated and the on-line matching phase. In the 
off-line phase: 

1. Assign to each bin in each model histogram a key which is (the number of pixels in 
the bin) / (total number of pixels in all bins) 

2. Sort the bins by key and split into T groups. T is a parameter that adjusts the grain 
at which decisions axe made as to which model bins to compare (see below). 

3. Associate each group with a table indexed by color histogram bin. The table entries 
are linked lists pointing to all the histogram bins in the group with that index. 

In the on-line phase: 

1. Sort the image histogram bins by size. 




31 




Figure 3: Model indexing experiment based on color cues (continued in the next two figures) 
Each of the sixty-six models shown here is represented by its color histogram. 



32 




Figure 4: The unknown objects. Each is identified with the database color histogram that 
best matches its own color histogram. 



Figure 5: The results of matching all combinations of image and database histograms dis- 
played pictorially where the size of the squares are proportional to match values. The 
dominance of the diagonal values shows that the correct match is almost always selected. 
Twenty nine of thirty-two matches are correct; in three cases the correct model received sec- 
ond highest score. Models are along the horizontal axis; unknown objects along the vertical 



axis. 









33 



Recognition Times (milliseconds) | 




Database Size 
19 37 70 


Histogram Intersection 


38 


73 


150 


Incremental Intersection 


15 


15 


15 



Table 2: Recognition times as a function of database size for the standard algorithm His- 
togram Intersection and the fast indexing scheme Incremental Intersection (B=10, T=30). 
Timings were made on a SUN SPARCstation 1. 

2. For the B largest image bins, starting with the largest, match the image bin to all the 
model bins with the same index and in a table whose maximum key is larger. If a new 
table is entered in this process, match all the larger image bins to the model bins in 
that table. 

The efficiency of Histogram Intersection and Incremental Intersection is compared in 
Table 2. A complexity analysis shows that the time for Histogram Intersection depends 
linearly on the size of the histogram and the size of the database, i.e. the algorithm is 
0(nm), where n is the size of the histogram and m is the number of models in the database. 
As long as a bucket only indexes one model in the computation, the complexity of Incremental 
Intersection is independent of the size of the database. This explains the constant run-times 
of Incremental Intersection shown in the table. Asymptotically, the number of models each 
bucket indexes is linearly related to the number of models in the database. Therefore the 
expected asymptotic complexity of Incremental Intersection is also linear in the number of 
models (with a small constant factor), that is, 0{n\ogn + cm), where c is the number of 
image bins used for indexing into the database (see [Swain and Ballard, 1991] for more 
details). 

4.4 Improving Discriminability using Salience 

Recognition accuracy can be enhanced by concentrating on cues that distinguish an object 
from others in the database if the objects are similar in many ways. As an example consider 
a group of white shirts with different colored logos on them. Simple Histogram Intersection 
is incapable of distinguishing them, as shown in Figure 6(a). However, if the colors that show 
up in the rest of the database and are expected in the background are ignored, Histogram 







34 



■ 


BB 




■ 


l— 




1 






D 


■ - 












H 


■ 




H 


T 






D 


• ■ - 




D 


TH 






D 


■ 












5 


■ 




H 


w f 






6 


■ 







list.sqt%128.4- 






list.sqr.128.4* | 



Figure 6: Experiment in distinguishing six white shirts with different colored logos, (a) 
Results of Histogram Intersection, (b) Results of modified Histogram Intersection, in which 
only colors unique to object model axe matched. All the shirts are correctly identified. 

Intersection can distinguish them all Figure 6(b). In this situation, the salient colors are 
defined to be those that are unique to an object and distinguish a particular object from the 
rest of the database. 

The proposed indexing strategy is then: Consider all the colors equally to narrow the 
decision to a smaller number of similar objects and then apply a salience measure to differ- 
entiate among the objects. 

The simple definition of salience described above is not the only useful one. For instance, 
combinations of histogram bins that are unique to an object could be used instead of the 
single bins used in the example. 

5 Searching for Objects 

Section 2 discussed possible uses for an object search mechanism. Let us now consider 
object search in more detail. Given a specification of the desired object, the goal of the 
object search mechanism is to produce a set of gazes (viewing positions and directions) that 
result in images that are likely to contain the object. It is important to realize that speed is 
an important component in this problem. If the mechanism is to be useful, it must recognize 
objects quickly. If it cannot find an object quickly, then it would be better to maintain a 
categorical map of the world such as that produced by image interpretation (despite the 









35 



problems with coordinate systems as described in Section 2) than to search for an object 
every time it is needed. 

To achieve this goal, we have chosen to use a two-phase strategy. In the first phase, a 
fast object detection mechanism is used to detect the likely presence of the object in each 
image without performing pose estimation. The purpose of this phase is to prune out all 
but a small number of the possible gazes. In the second phase, more sophisticated object 
recognition algorithms axe applied to the image produced by each gaze in order to confirm 
the object’s presence and to compute its pose. This section describes an implementation 
of the first phase only. The object detection mechanism used in our experiments relied on 
matching image histograms to model histograms and was similar to the object identification 
algorithm discussed in the previous section, but modified to be more scale-invariant. (See 
[Wixson and Ballard, 1989] for more details.) 

The search strategy discussed below is a generic search strategy suitable for finding an 
object that is somewhere in a room. Although clearly high-level knowledge such as “Cereal 
boxes are usually on the countertop” can be very useful for object search [Garvey, 1976], 
this knowledge will sometimes fail in unexpected situations. In these situations, some sort of 
default generic searcher must be used. It should be noted that it is in this case that pruning 
provided by the object detection mechanism is most important, due to the fact that we have 
no high-level knowledge to limit the search to certain portions of the room. 

5.1 Sampling the Space of Possible Gazes 

The most obvious generic strategy for object search is to position the robot in the center of 
the room and rotate the gaze 360 degrees around the vertical axis. To carry out object search 
experiments, we have mounted a Pulnix color camera on a Puma 761 robot arm [Brown, 
1988]. The arm is mounted in the center of a 16' x 24' room that contains cluttered scenes 
containing everyday objects. To search the room, the camera is positioned in the center 
of the room and rotated 360 degrees in increments of 15 degrees, with the field-of-view of 
the camera adjusted so that the spatial volumes seen by adjacent increments are spatially 
adjacent. This 360 degree rotation is executed for each of several pitch angles, so that the 
camera can examine the upper walls, the lower walls, and the floor. The object detector 




36 



mechanism, which produces a “confidence” that the object is in the scene, is run once at 
each of the gaze configurations. 

After this sampling of the set of possible gazes is complete, the confidences are examined 
to determine which gazes produced “significant” confidences. A simple and effective mech- 
anism for this is the criterion that for a confidence to be significant it must be at least one 
average deviation greater than the mean confidence for that object over all of the gazes eval- 
uated. By expressing significance in terms of the distance from the mean, we avoid the use of 
thresholds that may vary with the surroundings (suppose we put new wallpaper or carpeting 
in the room) or with the specific object detection mechanism being used. The resulting set 
of significant gazes in then pruned further by eliminating gazes for which an adjacent gaze 
produced a larger confidence. For example, if the set contains gaze A = (orientation = 30°, 
pitch = 10°, confidence = .3) and gaze B = (orientation = 30 + z°, pitch = 10°, confidence 
= .7), where x is the rotation increment (usually 15°), then gaze A would be removed from 
the set. This is done so that when the desired object lies in the views of two gazes, only the 
better of those gazes is saved. The gazes remaining in the set after this priming are those 
considered most likely to view the desired object. 

5.2 Refining the Gazes 

Since the above strategy samples discrete (although adjacent) volumes of space, it is possible 
that an object may appear only partially in any image. We have implemented a mechanism 
that, given an initial gaze, attempts to adjust the gaze to bring the object entirely into the 
picture. More specifically, the gaze is adjusted so as to maximize the confidence produced 
by the detection mechanism, i.e. to maximize the goodness of the object’s signature. 

This maximization is done by gradient search in the space of possible gazes. Given a 
starting gaze, the arm translates the camera (relative to the camera’s initial gaze) by a small 
fixed amount up, down, left, right, forward and backwards. The detector’s confidence at each 
of these relative positions is saved, and the gaze is finally adjusted by the transformation 
that resulted in the largest increase in the confidence. This process is repeated until no move 
results in an increase in the confidence. 




37 



5.3 Summary 

Our proposed search strategy can now be summarized. To search for a certain object, 

1. Sample the space of possible gazes using a simple and fast object detection mechanism 
which produces a confidence value for each gaze. 

2. Prune the gazes using their confidence values and relative locations, producing a set 
of significant gazes. 

3. For each significant gaze (in order of decreasing confidence), 

(a) Refine the gaze using the simple object detection mechanism; 

(b) Invoke the sophisticated object recognition scheme to determine whether the ob- 
ject is really in the area covered by this gaze, and if so, to obtain the pose of the 
object. 

Steps 1,2, and 3a have been implemented. The following section presents the results of steps 
1 and 2. 



5.4 Performance 

Generation of a 2-D opponent color histogram is performed at approximately 3 512 x 480 
frames/second by a Datacube MaxVideo real-time image processing system. Matching of 
the new histogram to the example histograms of the object being searched for is performed 
on a Sun 3/260. A typical search consists of rotating the camera around the vertical axis in 
increments of 15° at 3 different pitch angles, thereby requiring that 72 gazes be evaluated. 
Our system performs this evaluation in 3.5 minutes, taking just under 3 seconds to move to 
a new gaze, grab the histogram, and compute the match. This time is probably the best 
that can be done with our equipment in its current configuration, since moving to a new 
gaze takes one second, and a half-second delay is required between termination of the move 
and grabbing the frames for the histogram computation in order to allow the vibration of 
the camera to damp out sufficiently that the color will not suffer from motion blur. 




38 



The database of model histograms is usually acquired through an iterative process. Our 
method is to start with a small set of histograms of each object (usually two histograms, 
taken from two different positions), and run the search task. If an object is missed, we cover 
the background with black cloth, leaving the object in the same orientation, histogram the 
scene, and save the scene as an example of the object. By draping the background with 
black cloth, we eliminate the background signal from the histogram. Leaving the object 
in the same orientation ensures that the orientation and/or lighting effects that caused the 
object to be missed originally will be present in the new example histogram. The objects 
axe then moved to different positions, the search task is executed again, and new histograms 
are learned if necessary. This process is continued until performance is deemed acceptable; 
the end result is that the database contains ~ 4 example histograms for each object. 

Figure 7 shows the direction (but not the distance) of everyday multi-colored objects in 
a cluttered room in relation to the robot for a typical run of the search task. In addition 
to these objects, the room contains many other black, gray, or white objects such as tables, 
cabinets, TV monitors, bookshelves, and chalkboards. Figure 8 shows the gaze directions 
produced when the search strategy is executed for “Clorox” and “All”, detergent containers. 
In these figures the area of each circle is proportional to the confidence that the gaze in that 
direction includes the object. The numbers next to the circles reflect the ordering of the 
confidences in order of decreasing confidence; a circle with number 0 denotes the gaze that 
the system feels is most likely to contain the object. For both examples the proper gaze is 
gaze 0 (the highest-ranked gaze). 

As described above, the manner in which we compile the test database results in the 
elimination of almost all false negatives. The false negatives encountered in practice are 
almost always due either to the presence of objects that massively obscure the signature of 
the object in the image, or to searching for an object that exhibits a large amount of specular 
reflection (for which it is difficult to compute stable example histograms). In typical runs, 
the correct gaze is almost never left out of the final set of gazes. Thus, this detection method 
meets the criterion that there be a low percentage of false negatives. 

In theory, many possible images may give rise to the same histogram and hence to 

the same signature, resulting in a possibly large number of false positive matches. Our 




39 




Figure 7: Top- view of the laboratory environment for a typical test run showing the direction 
(but not the distance) of each object with respect to the robot. 





Figure 8: Gaze directions produced by the object search mechanism for the “Clorox” and 
“All” detergent boxes. Area of circle is proportional to the confidence in that gaze. Numbers 
next to circles reflect the ordering of the confidences in decreasing order. 




40 



experiments with the searcher and with making forced choice classifications from a database 
of images have shown, however, that this does not occur overwhelmingly often. The pruning 
strategy usually leaves only ~ 6 out of the 72 possible gazes for further processing 1 ; these axe 
the gazes that have the highest confidence that they contain the desired object. This is more 
than a 90% reduction in the set of possible gazes. As stated above, one of these leftover gazes 
almost always contains the object, although this is not always the gaze that produces the 
maximum goodness. Thus, the false positives that are generated are not numerous enough 
to cause the correct gaze to be discarded by the pruning strategy. 

5.5 Other Aspects of Object Search 

There are many aspects of object search that have not been discussed in this paper. These 
include reasoning about occlusion and the use of high-level knowledge about common re- 
lationships between objects. For further discussion of these, see [Wixson, 1992; Wixson, 
1990]. 

6 Conclusions 

Object identification and search are two examples of tasks that are much simpler than image 
interpretation and producer much simpler spatial representations than image interpretation. 
Interestingly, however, they suffice in many situations where image interpretation would 
traditionally be applied. This lends credence to the animate vision hypothesis that many 
tasks do not require image interpretation in its totality. We have presented promising new 
techniques for identification and search that are based on a simple color histogram represen- 
tation. 



Acknowledgments 



The “vision group” provided valuable advice and interesting discussions. This work was 
supported by NSF research grant DCR-8602958. 



is probably the case that up to eight leftover gazes are acceptable, since on average the object will be 
found in the fourth gaze and therefore a conventional recognition method that takes a minute to process a 
scene will take an average of four additional minutes (for a total of 7.5 minutes) to find the object. 





41 



References 

[Bajcsy et al, 1989] Ruzena Bajcsy, Sang Wook Lee, and Ales Leonardis, “Image Segmen- 
tation with Detection of Highlights and Inter-Reflections Using Color,” In Image Under- 
standing and Machine Vision, 1989 Technical Digest Series, Vol. H , pages 16-19. Optical 
Society of America, June 1989. 

[Ballard, 1989] Dana H. Ballard, “Reference Frames for Animate Vision,” In International 
Joint Conference on Artificial Intelligence , pages 1635-1641, 1989. 

[Ballard and Brown, 1982] D.H. Ballard and C.M. Brown, Computer Vision , Prentice Hall, 
1982. 

[Biederman, 1985] I. Biederman, “Human Image Understanding: Recent Research and a 
Theory,” Computer Vision, Graphics and Image Processing , 32(1):29— 73, 1985. 

[Brooks, 1987] Rodney A. Brooks, “Visual Map Making for a Mobile Robot,” In Martin A. 
Fischler and Oscax Firschein, editors, Readings in Computer Vision , pages 438-443. Mor- 
gan Kaufmann, 1987. 

[Brown, 1988] Christopher M. Brown, “The Rochester Robot,” Technical Report 257, De- 
partment of Computer Science, University of Rochester, 1988. 

[Feldman and Yakimovsky, 1974] J. A. Feldman and Yoram Yakimovsky, “Decision Theory 
and Artificial Intelligence: I. A Semantics-Based Region Analyzer,” Artificial Intelligence , 
5:349-371, 1974. 

[Feldman, 1985] Jerome A. Feldman, “Four Frames Suffice: A Provisional Model of Vision 
and Space,” The Behavioral and Brain Sciences , 8:265-289, 1985. 

[Forsyth, 1988] D. A. Forsyth, “A Novel Approach to Colour Constancy,” In Proceedings of 
the IEEE International Conference on Computer Vision , pages 9-18, 1988. 

[Garvey, 1976] Thomas D. Garvey, “Perceptual Strategies for Purposive Vision,” Technical 
Note 117, SRI International, 1976. 

[Healey and Binford, 1987] Glenn Healey and Thomas 0. Binford, “The Role and Use of 
Color in a General Vision System,” In Proceedings of the DARPA IUS Workshop , pages 
599-613, 1987. 

[Hurlbert and Poggio, 1987] Anya C. Hurlbert and Tomaso A. Poggio, “Learning a Color 
Algorithm from Examples,” In Neural Information Processing Systems , pages 622-631, 
1987. 

[Klinker et al. , 1988] Gudrun J. Klinker, Steven A. Shafer, and Takeo Kanade, “The Mea- 
surement of Highlights in Color Images,” International Journal of Computer Vision , 
2:7-32, 1988. 




42 



[Maloney and Wandell, 1986] Laurence T. Maloney and Brian A. Wandell, “Color Con- 
stancy: A Method for Recovering Surface Spectral Reflectance,” Journal of the Optical 
Society of America A (JOSA-A), 3(l):29-33, January 1986. 

[Maunsell and Newsome, 1987] John H. R. Maunsell and William T. Newsome, “Visual 
Processing in Monkey Extrastriate Cortex,” Annual Review of Neuroscience, 10:363-401, 
1987. 

[Mishkin and Appenzeller, 1987] Mortimer Mishkin and Tim Appenzeller, “The Anatomy 
of Memory,” Scientific American , 256(6):80-89, June 1987. 

[Ohlander et a/., 1978] R. Ohlander, K. Price, and D.R. Reddy, “Picture Segmentation 
Using a Recursive Region Splitting Method,” Computer Graphics and Image Processing , 
8:313-333, December 1978. 

[Rubner and Schulten, 1989] J. Rubner and K. Schulten, “A Regularized Approach to Color 
Constancy,” Biological Cybernetics , 61:29-36, 1989. 

[Swain and Ballard, 1991] Michael J. Swain and Dana H. Ballard, “Color Indexing,” Inter- 
national Journal of Computer Vision , 7:11-32, 1991. 

[Wixson, 1990] Lambert E. Wixson, “Autonomous Acquisition and Utilization of High- 
Level Spatial Knowledge to Search for an Object,” Technical Report 338, University of 
Rochester Computer Science Department, April 1990. 

[Wixson, 1992] Lambert E. Wixson, “Exploiting World Structure to Efficiently Search for 
Objects,” Technical Report 434, University of Rochester Computer Science Department, 
July 1992. 

[Wixson and Ballard, 1989] Lambert E. Wixson and Dana H. Ballard, “Real-Time Detection 
of Multi-Colored Objects,” In SPIE Sensor Fusion II: Human and Machine Strategies , 
volume 1198, November 1989. 




A MODEL OF HUMAN FEATURE DETECTION 
BASED ON MATCHED FILTERS 



M. C . Morrone and D.C. Burr 

Istituto di Neurof isiologia del CNR f 
Via S. Zeno 51, 

Pisa, Italy 



It is generally accepted that edge and line detection is 
an important stage of any visual system, biological or 
artificial. Many algorithms have been developed, either to 
simulate how humans may detect lines and edges, or as a stage 
in artificial image processing (see Hildreth, 1985) . Most 
algorithms convolve the input image with operators of limited 
bandwidth, and search either for zero-crossings or peaks in 
the output. 

There are several limitations inherent to all current 
algorithms. Convolution by an operator of limited bandwidth 
introduces ripples into the waveform, which can lead to 
spurious peaks and zero crossings, and hence false marking of 
features. A further limitation is that an operator designed to 
detect edges will also tend to mark lines (at an inappropriate 
position) , and vice versa. For example, after convolution with 
a Dif ference-of-Gaussian operator, a line will cause a major 
peak corresponding to the centre of the line (the real 
feature) , two strong zero-crossings on either side (whose 
distance from the peak depends on the spatial scale of the 
operator) and two weaker peaks further out from the 
zero-crossings. The spurious zero-crossings and peaks can lead 
to false marking of edges and lines. To minimize false 
positives attempts have been made to optimize the bandwidth 
and shape of operators, trading off precision of localization 
with signal detection efficiency (Canny, 1983) . Other schemes 




44 



to minimize false positives include thresholding, requiring 
correspondence across spatial scales (Marr and Hildreth, 1980; 
Marr, 1982; Yuille and Poggio, 1985), using interpretation 
rules (Watt and Morgan, 1985) , and resynthesizing images 
convolved with different types of operators (Canny, 1983) . 
None of these schemes is perfect, and all face severe problems 
under certain features, such as adjacent edges of the same 
polarity, and combinations of edges and lines at the same 
point. 

We have recently proposed a feature detection model for 
one- dimensional images, in which lines and edges are detected 
simultaneously by the same algorithm (Morrone and Burr, 1988; 
Morrone and Owens, 1987; Burr and Morrone, 1990). The 
algorithm has two stages, one linear and one non- linear. At 
the linear stage the image is convolved separately by two sets 
of operators with identical Fourier amplitude spectra, but 
with phase spectra which differ by /r /2 (i.e. they are related 
by the Hilbert transform, and hence orthogonal functions in 
l 2 ) . One of these operators is an even- symmetric function, 
designed to respond best to lines, the other odd- symmetric , 
designed to respond best to edges. After the convolutions, the 
output is squared separately and summed, to give the square of 
what we term the "Local energy" profile (following Adelson and 
Bergen, 1985) . We have shown that for a wide range of 
patterns, local peaks in the energy function occur at points 
where visually salient features are perceived by human 
observers. These are the only peaks in the function; the 
spurious ripples introduced by the convolution process are 
annulled when the squared filtered images are summed. Thus to 
detect and locate features, it is sufficient to search for 
local maxima in the energy function, without the need for 
thresholding or use of interpretation rules to distinguish 
real features from false ones. 

One of the main properties of this model is that both 
edges and lines cause peaks in local energy (unlike standard 
algorithms) , so combinations of edges and lines (that can 




45 



occur naturally in shadows and conditions of oblique lighting) 
do not counteract each other to annul features. To determine 
whether the feature is an edge or a line or both, it is 
necessary to return to the linear stage of the algorithm, and 
evaluate the amplitude of the convolved images , at the points 
of local energy peaks. Non-zero amplitude in the even-symmet- 
ric convolution implies a line; non-zero amplitude in the 
odd-symmetric convolution implies an edge. If both are 
non-zero, the feature is a combination of line and edge. 

The energy model has been shown to predict various 
perceptual phenomena in one-dimensional images, such as Mach 
bands (Morrone et al. 1986; Ross, Morrone and Burr, 1989), the 
Chevreul illusion (Morrone and Owens, 1987) and the 
Craik-Cornsweet illusion (Burr, 1987; Burr and Morrone, 1990), 
both quantitatively and qualitatively. It also predicts the 
perceived positions of complex combinations of lines and edges 
(Morrone and Burr, 1988) , where other current models perform 
poorly. 

The theoretical and mathematical justification for the 
energy model is given in previous papers (Morrone and Burr, 
1988; Morrone and Owens, 1987), that show that local maxima of 
the energy function occur at points on the waveform where the 
Fourier harmonics come into phase. This suggests that maximal 
similarity in Fourier arrival phase (or argument) may be a 
useful definition of visual features. For edges, the average 
arrival phase is x/2, and for lines it is zero. This 
definition holds both for net edges and lines (step and delta 
functions) and for other features perceived as lines or edges 
(such as the illusory Mach bands, perceived at points where 
luminance ramps change slope) . For all the waveforms we have 
tested to date, "arrival phase congruency" has proven to be a 
satisfactory definition of features, which helps explain the 
importance of phase in visual signals (Openheim and Lim, 
1981) . 




46 



The purpose of this paper is to extend the model to 
two-dimensional images, using operators like those of the 
one-dimensional model, with identical amplitude spectra but in 
quadrature phase. This constraint in two-dimensional space 
automatically introduces an orientation bias in the operators, 
so the image must be analysed by several operators of 
different preferred orientations. As with the one-dimensional 
model, we chose the parameters of the two-dimensional model to 
parallel as closely as possible those of the human visual 
system. 



IMPLEMENTATION AND RESULTS 

The proposed feature detection algorithm is similar to 
that developed for one-dimensional images. The local energy 
function is calculated by convolving the image with orthogonal 
operators of equal amplitude spectrum, and by then taking the 
pythagorian sum of the two separate outputs . Features are 
marked by the peaks of local energy, separately at four 
different orientations and three spatial scales. The feature 
maps are summed over orientation, to give a scale-by-scale 
description of the image, and can be summed over scales to 
give a global feature map. 

Oriented matched filters 

The first stage of the model is a linear convolution with 
two sets of masks, one odd-symmetric about a given plane, the 
other even- symmetric . The parameters for the masks were chosen 
from psychophysical and electrophysiological data from human 
and other primate visual systems. Both operators were 
selective to both orientation and spatial frequency. The 
amplitude spectra of the masks was given by equation 1, an 
equation that fits a range of psychophysical and electrophysi- 
ological data (e.g. Anderson and Burr, 1985, 1987; Hubei and 
Wiesel, 1977) . 




47 



1) r(u,y) = g(u)/(y) 

gr(u) = exp(-ln 2 (|u|// > )/2a 2 ) 

/(i>) = exp(-u 2 /20 2 ) 

a and v are the frequency co-ordinates oriented along the 
preferred and non-preferred orientations respectively. g(u), 
the spatial frequency tuning along the preferred orientation, 
is a Gaussian function on a logarithmic scale with a full 
band-width at half height of 2 octaves (given by cr tt -o.53 In 
units). Peak frequency p was 0.18 c/pixel. f(v), the spatial 
frequency selectivity orthogonal to the preferred orientation, 
is Gaussian on the linear axis, with cj„ = o.h c/pixel. The 
orientation full bandwidth of r(u,u) at half height is 70°. 

Each image was analysed with pairs of masks at four 
orientations (0, -45, +45 and 90°) and three spatial scales 

(two octaves apart) . Given the bandwidth of the operators, all 
useful spatial frequencies at all orientations were conserved 
at the linear stage, without excessive selective attenuation. 
The convolution masks were obtained by inverse Fourier 
transform of the amplitude spectra of equation 1, assuming a 
constant phase spectrum of either o or sign(u)jr/2 (where u is the 
rotated coordinate at the preferred orientation) . The phase 
spectrum is consistent with experimental data (Field and 
Nachmias, 1984; Burr, Morrone and Spinelli, 1989). Thus for a 
given orientation, the even- and odd-symmetric masks have 
identical amplitude spectra, and are related via the Hilbert 
transform. All oriented masks were 31 x 31 pixels and had the 
same preferred spatial frequency (at the preferred orienta- 
tion) . Analysis at different scales was achieved by the 
Gaussian pyramid technique, described in the next section. 

Figure 1 reproduces contour plots of four examples of the 
masks. Figures A and B are masks oriented at 0° (with respect 
to the vertical) ; figure A is even- symmetric around the 
vertical plane, and figure B is odd- symmetric. Figures C and 
D are the even- and odd-symmetric masks at -45° orientation. 
Note that the masks at different orientations are not exactly 
identical, due to the grid-like under- sampling. 




48 







PIXEL 



FIGURE 1 

Contour plots of four of the eight masks used for 
convolution. A and B are orientated vertically (0°) , C and D at 
-45°. The masks were calculated by inverse Fourier transform of 
the amplitude spectra given in equation 1, assuming a constant 
phase spec.trum (see text) . Masks A and C are even-symmetric 
about the preferred orientation, with a central positive 
region flanked by adj acent negative regions . Masks B and D are 
odd- symmetric , comprising separable regions alternating in 
polarity. Although only 13X13 pixels are depicted, the masks 
spanned 31X31 pixels, to minimize truncation effects. 




49 




Contour plots of local 
energy functions in re- 
sponse to a bright dot. The 
upper function was obtained 
with vertically oriented 
masks, the lower with masks 
oriented at -45°. Features 
are marked at local maxima 
at the orientation of the 
steepest gradient. For both 
orientations , the only 
feature marked is at the 
centre of the blobs, which 
corresponds to the position 
of the dot. 




-4 -2 0 2 4 



PIXEL 



Figure 2 illustrates the application of the algorithm to a 
simple image, a small isolated dot. The dot was first 
convolved by the four pairs of orthogonal operators, to 
produce an output virtually identical to the profile masks 
themselves shown in figure 1. At each orientation, local 
energy functions were calculated by the taking the square-root 
of the sum of the squared output of the even- and 
odd-symmetric operators. Figure 2 shows examples of energy 
functions at orientations 0 (A) and -45° (B) . The functions 

are smooth blobs slightly elongated along the non-preferred 
orientation of the masks. Features are marked by maxima in 
the local energy functions along the orientation where the 




50 



energy has maximum gradient (using the interpolation algorithm 
of Canny, 1983) . This procedure minimizes overestimation of 
feature length, a common problem with elongated masks (see 
Canny) . By inspection it is obvious that for both energy maps 
the only maxima along the direction of maximum energy gradient 
is the centre of the blobs, which correspond to the position 
of the input dot (without smear along the direction of the 
oriented masks) . The final feature map is simply the sum of 
the features marked at the four orientations. No thresholding 
is necessary. 

Spatial scales 

As mentioned earlier, the operators are selective to 
spatial frequency as well as to orientation, with a bandwidth 
of about 2 octaves (like human visual detectors) . To cover the 
range of useful image frequencies, the image was analysed at 
three scales, with each scale separated by two octaves 
(corresponding to the bandwidth of the operators) . To minimize 
computer time, the analysis at the lower scales was achieved 
through reducing image size by a factor of four and using the 
same masks as for the higher scales. The images were reduced, 
and later expanded, by the Gaussian Pyramid technique of Burt 
and Adelson (1983) . As well as reducing computer time, the 
pyramid compression technique maintains the mask sample grain 
at all scales, which is consistent with data from human visual 
detectors . 

Figure 3 illustrates feature marking of an image at three 
scales. The upper figures show the original image, and the 
image reduced by factors of 4 and 16. The lower figures show 
the summed feature maps at each scale. As with the previous 
example, local energy was calculated separately for each 
orientation, and features marked at local maxima of energy 
along the direction of maximum energy gradient. The intensity 
at each marked feature corresponds to the amplitude of the 
local energy at that point. The four maps were summed across 
orientations at each spatial scale to produce the feature maps 
of figure 3. 




51 




FIGURE 3 

Application of the algorithm to a 513x513 pixel image at three 
spatial scales. The original image (A) was reduced using the 
Gaussian pyramid reduction technique, by a factor of four (B) 
and again by another factor of four (C) . For each of the three 
images, local energy was calculated at four orientations and 
features marked at local maxima along the locally steepest 
energy gradient. The strength of the feature was given by the 
amplitude of the energy functions at those points. The lower 
images (D,E & F) show the feature maps at each scale, summed 
over orientation. 




52 



At each scale, the model marked the features of interest 
reasonably well, without need for thresholding or other noise 
rejection techniques. All types of features — borders, lines, 
shading, speckle, etc. — were marked with equal precision. To 
define the nature of each marked feature, we return to the 
linear stage of the model and consider the relative amplitudes 
of the even- and odd-symmetric operators. Positive amplitude 
in the even- symmetric convolution signals a bright line, and 
negative amplitude a dark line. Positive or negative amplitude 
in the oriented even- symmetric convolution signals positive 
or negative going edges. The operation at the highest scale 
provided the best sketch of the image, but the maps at the 
lower scales are also consistent with the outline of the baby 
face . 

For many purposes, the feature maps at each scale would be 
a sufficient description of the image. It may be useful, 
however, to combine the information of the three scales to 
obtain a total feature map. We suggest that this could be 
done by expanding the maps at the lower scales (using the Burt 
and Adelson technique) to the size of the original image, and 
adding the three maps. During the expansion the amplitude is 
not scaled, so that the volume of the marked features remains 
the same as for the condensed image (but the amplitude is 
reduced) . This has the effect of privileging information from 
the higher spatial scales. The technique is biologically 
reasonable, as it is known that the accuracy of position 
judgments scales with spatial frequency content of the image 
(Klein and Tyler, 1981) . 

An example of a combined feature map is shown in figure 
4F . This image is sum of the maps from figure 3, after 
expanding the lower scales. Because of the automatic 
privileging of the smaller scales, they make the greatest 
contribution to the map. However, other features of interest, 
such as the outline on the left of the face, are better marked 




53 



at the lower scales. Given that there is no thresholding or 
post processing, the map depicts reasonably well the main 
features of the image. 

Visual Illusions 

A strong test of a model of human vision is whether it 
"sees" illusory forms in the ways that humans do. Many 
researchers (e.g. Barlow, 1959; Marr, 1976) have suggested 
that the human visual system searches for features of 
interest, and encodes them in some form of feature map, which 
would be a useful and economic code of the image (a similar 
process is almost certainly necessary in computer vision) . If 
perceptual processes rely exclusively on a feature map for 
interpretation and structuring of images, much original image 
information could be lost or inaccessible. 

Figure 4A is an example of a well known illusion, 
resulting from image coarse quantization (Harmon and Julesz, 
1973) . Although the image contains sufficient information at 
low spatial frequencies for recognition (verified by blurring 
the image, or viewing from a distance) , the image cannot be 
recognized under normal viewing conditions: it appears as a 

collection of jumbled blocks. The edges of the blocks dictate 
the perceptual organization of the image, so that the observer 
loses access to the low frequency information (see Morrone, 
Burr and Ross, 1983). 

Figure 4D shows the summed feature map obtained from the 
local energy algorithm, following the same procedure as that 
used to mark the figures of the original image (4F) . The 
markings follow the outlines of the blocks, and are in no way 
associated with the form of the figure. If perceptual 
organization were based on a map of this type, the image would 
appear as a jumble of blocks, as human observers perceive it. 

After digital blurring (figure 4B) , the blocked image 
becomes recognizable. The feature map of the blurred image 
(again with the same algorithm) is shown in figure 4E. 




54 




FIGURE 4 

Application of the feature detection algorithm to coarse 
quantized images. The image of figure C (512x512) was blocked 
into a 16 x 16 matrix by averaging the luminance within each 
block. Figure B is a lowpass filtered version of figure A. On 
the right are the feature maps of the images, summed over 
orientation and three spatial scales (see text) . The features 
of figure D follow the block outline of figure A, similar to 
the human perception of the figure. After blurring, however, 
the feature map is consistent with the outline of the original 
image. Figure F shows the summed feature map to the original 
image . 




55 



Although this map is not a very good description of the image, 
the markings do tend to follow the outline of the face, and to 
point to key features, such as the eyes. Recall that local 
energy marks all features, light and dark lines and positive 
and negative going edges. If the nature of the features were 
taken into account, a reasonable resynthesization may be 
possible. This aspect is currently being investigated. 

Figure 5 shows another illusory figure, designed to 
demonstrate how congruence of arrival phase, and hence peaks 
in local energy, organize the perceptual appearance of images. 
Closely viewed, it appears as a chevron pointing left, while 
from a distance the chevron seems to point rightward. The 
equation for the figure is given in the caption. Each row 
comprises the sum of 64 cosine harmonics. The phase (but not 
the amplitude) of the harmonics changes systematically from 0 
in the middle row to it in the upper and lower rows . This 
causes a systematic leftward shift in average luminance, 
resulting from the phase shift of the lower harmonics. There 
is also a group phase delay, such that the point where all 
harmonics have the same arrival phase shifts systematically 
rightward. 

The lower figures show the features maps from the energy 
model, for the original figure (right) and for one blurred by 
digital lowpass filtering (left) . Peaks in local energy occur 
at points of congruence of arrival phase (irrespective of the 
argument value at that point) , so the feature map follows the 
lines of phase congruence and describes a chevron pointing 
leftward. Note that the features marked can be edges (arrival 
phase it / 2 or 3n/2) , lines (arrival phase 0 or it) , or 
combinations of edges and lines (all other phases) . The local 
energy function marks all features equally well (whereas this 
pattern would clearly confuse most other algorithms) . If 
perception were guided by a feature map similar to that given 
by peaks in local energy, the perception should be of a 
leftward pointing chevron, even though the average luminance 
moves in the other direction. When the pattern is blurred 




56 




FIGURE 5 

When viewed up close the upper figure appears as a chevron 
pointing leftwards; from a distance, or when blurred, it seems 
to point rightwards. The luminance profile Z(x,y) of the 
waveform is given by: 



L{x,y) = L 0 



4 CL 



cos 



2 n 



jy_ 

2 T 





for y < 0 



Z.(x,y) = Z(x,-y) 



for y > 0 



For k an odd integer, o<*<4T and -t < y < t . z 0 is the mean 
luminance, a amplitude and t the period. The lower figures 
show the feature maps, summed over four orientations and three 
scales. The right figure is the map of the original image, the 
left of the image blurred by digital filtering. Both maps 
predict the perceived structure, suggesting that the human 
visual system construct maps similar to these. 



57 



(analogous to the optical blurring of distant viewing) , the 
higher harmonics that reinforce the local energy peaks are 
attenuated, and the peaks in local energy tend to follow the 
organization of local luminance. 

The final example of the action of the model is 
illustrated in figure 6. The input image is the upper left 
figure. The background pattern is made up of pyramidal shapes, 
created by multiplying a vertical with a horizontal 
triangle-wave luminance profile. The inner patterns are 
identical to the outer, on a smaller scale. There are two 
illusory effects with this pattern. Firstly, the perceived 
bright and dark crosses are illusory: the luminance profile is 
locally pyramidal with the apexes at the centre of the 
perceived crosses. The crosses are the two-dimensional 
equivalent of Mach bands (Mach, 1865) . Another illusory effect 
is that the borders of the innermost pattern do not seem to be 
straight, but bulge inwards. If the pattern is viewed from a 
distance, the borders of all the inner patterns bulge in the 
same way. 

The other images of figure 6 show the feature maps of the 
local energy model (for the middle scale only) . In this case 
the maps do not depict total energy (as in the previous 
examples) , but the contribution of the various types of 
features. The upper right photograph is the map of dark lines, 
the lower left bright lines and the lower right edges (of both 
polarities) . The maps were produced by first marking all 
features at each orientation (by the method described 
previously) , and then determining the amplitude of the feature 
(at each orientation) from the output of the linear 
convolution. The dark lines are given by the absolute value of 
negative values from the even- symmetric operator, the bright 
lines from positive values of the even- symmetric operator, and 
edges from the odd- symmetric operator (positive and 
negative) . 




58 





FIGURE 6 

Examples of separate line and edge maps of a complex image. 
Figure A shows the image, B the map of marked dark lines, C 
the map of marked bright lines and D the map of edges (of both 
polarities) . The outer part of the image can be defined as the 
product of a vertical and horizontal triangle-wave, with four 
periods per picture. The inner versions are identical, each 
scaled by a factor of two. The edge and line maps predict the 
illusions created by the stimulus, both the cross-like Mach 
bands, and the bowing of the inner borders with increased 
viewing distance. 




59 



The crosses are well marked by the model, either as bright 
or dark lines; they do not appear in the edge map. The edge 
map marks well the borders of the larger inserts (as straight 
lines) , but only very weakly the innermost border. On the 
other hand, the inner border is marked as a dark line, curving 
inwards (as it appears to observers) . Again the model 
predicted well the appearance of a complex pattern, and showed 
the importance of lines and edges in structuring the 
perceptual organization of visual images. 



DISCUSSION 

We have presented an algorithm of edge and line detection 
designed to mimic the human visual response to image features. 
The algorithm locates features by calculating the local energy 
function (defined as the pythagorian sum of the image 
convolved with operators related by Hilbert transform) and 
searching for local maxima. It then identifies the nature of 
the marked feature by evaluating the amplitudes of the two 
separate convolution outputs at the point where the feature 
was marked. The algorithm was tested with simple images, and 
images that cause illusory effects. In all cases the algorithm 
marked features at the points where observers see them, even 
when the perceived feature was not obvious from the luminance 
profile . 

As the algorithm is primarily a model of human vision, we 
were guided by current knowledge of the mechanisms of human 
and animal visual systems, and chose the parameters of the 
algorithm from psychophysical and electrophysiological data. 
In the primary visual cortex of primates (and most other 
mammals) , all neurones are selective to stimulus size and 
orientation (e.g. Maffei and Fiorentini, 1973; Hubei and 
Wiesel, 1977). There exist two classes of cells, one 
quasi-linear (simple cells) and the other clearly non-linear 
(complex cells) ) . Simple cells have limited spatial frequency 
response (1-2 octaves) with line-spread functions that tend to 
be even- or odd-symmetric (Kulikowski and Bishop, 1981) . These 




60 



cells are grouped in the cortex in a way that could enable 
them to act as the matched filters of the linear stage of the 
algorithm (Pollen and Ronner, 1981) . Complex cells also have a 
limited spatial frequency response, but exhibit a clear 
second-order non-linearity (Spitzer and Hochstein, 1985) , 
consistent with the squaring and summing of local energy 
extraction. 

Features were marked by local energy maxima only along the 
orientation of maximum energy gradient. A plausible biological 
mechanism for this non- maximum suppression may be 
"cross-orientation inhibition" , observed electro-physiologi- 
cally in man and other mammals (Morrone, Burr and Maffei, 
1981; Morrone and Burr, 1986; Burr and Morrone, 1987). Cells 
of different orientation selectivity inhibit each other, so 
that the relative response of the preferred orientation is 
increased. 

There is as yet no strong evidence that biological visual 
systems do calculate a function similar to local energy, or 
use it to mark features. However, the operations in the model 
are biologically plausible, and are appropriate for a limited 
bandwidth parallel computer, such as the human brain. The fact 
that the algorithm behaves quantitatively like humans (Morrone 
and Burr, 1988; Ross, Morrone and Burr, 1989; Burr and 
Morrone, 1990) and "sees" the same illusions that human 
observers do is encouraging. 

Although designed primarily as a model of human vision, 
the algorithm may be a useful stage in computer visual systems 
as a precise and robust edge and line detection system. The 
algorithm has several potential advantages over current 
methods of feature detection. Both lines and edges, as well as 
other features with no local symmetry, are detected 
simultaneously. Each feature is marked uniquely, reducing the 
problem of false marking. It is relatively robust to noise, as 
local energy is maximum at points of congruence of arrival 
phase. Noise (by definition) has a random phase spectrum, and 
hence a low probability of arrival phase congruency at any 




61 



point. Chance aggregations of arrival phase will tend to occur 
at random isolated points, easy to remove by standard 
rejection techniques. 

With the current implementation, the algorithm is 
expensive in computer time, requiring 24 separate convolutions 
of the input image, 8 of them on the full size image. However, 
with specialized hardware, and use of orientation pyramid 
techniques (Adelson and Simoncelli, 1987) , this time could be 
reduced considerably, possibly to acceptable commercial 
levels. We are currently evaluating the performance of the 
algorithm (accuracy, noise immunity etc.) against standard 
algorithms . 



ACKNOWLEDGEMENTS 

This work was supported by grants from the Australian National 
Health and Medical Research Council , and the Italian Consiglio 
Nazionale delle Ricerche, Progetto Finalizzato Robotica 
(contract N . 89 . 00536 . 67) . 



REFERENCES 



Adelson, E.H. & Bergen, J.R. (1985) Spatio-temporal energy 
models for the perception of motion J^_ Opt . Soc . Abu. A2 
284-299. 

Adelson, E.H. & Simoncelli, E. (1987) Orthogonal pyramid 
transforms for image coding. Proc SPIE 845 50-58. 

Anderson, S.J. & Burr, D.C. (1985) Spatial and temporal 
selectivity of the human motion detection system Vision 
Res. 25 1147-115. 

Anderson, S.J. & Burr, D.C. (1987) Receptive field sizes of 
human motion detectors Vision Res . 27 621-635. 

Barlow, H.B. (1959) Possible principles underlying the 
transformations of sensory messages. In Sensory communi z 
cation (Edited by W.A. Rosenblith) . MIT Press, Mass, USA. 

Blakemore, C. & Campbell, F.W. (1969) On the existence of 
neurones in the visual system selectively sensitive to 
the orientation and size of retinal images. Ju Physiol^, 
(Lond. ) 225 437-455. 




62 



Burr, D.C. (1987) Implications of the Craik-0* Brien illusion 

for brightness perception. Vision Res . 27 1903-1913. 

Burr, D.C. & Morrone, M.C. (1987) Inhibitory interactions in 

the human visual system revealed in pattern visual evoked 
potentials. J_*_ Phvsiol . (Lond. ) 389 1-21. 

Burr, D.C. & Morrone, M.C. (1990) Edge detection in biological 
and artificial visual systems. In Vision: Coding and 

Efficiency (Edited by C. Blakemore) . CUP, Cambridge. 

Burr, D.C., Morrone, M.C. & Spinelli, D. (1989) Evidence for 
edge and bar detectors in human vision. Vision Res . 29 

419-431. 

Burt, P.J. & Adelson, E.H. (1983) The laplacian pyramid as a 

compact image code. IEEE Trans COM 31 532-540. 

Canny, J.F. (1983) Finding edges and lines in images. MIT AI 
Lab . Tech . Report 720. 

Field, D.J. & Nachmias, J. (1984) Phase reversal discrimina- 

tion. Vision Res. 24 333-340. 

Harmon, L.D. & Julesz, B. (1973) Masking in visual 

recognition: effect of two-dimensional filtered noise. 

Science 180 1194-1197. 

Hildreth, E.C. (1985) Edge detection. MIT AI Lab memo. 835. 

Hubei, D.H. & Wiesel, T.N. (1977) Architecture of macaque 
monkey visual cortex. Proc . Rov. Soc. Lond. B198 1-59. 

Klein, S.A. & Tyler, C.W. (1981) Phase discrimination of 
single and compound gratings. Invest. Ophthal. Vis. Sri 
S20 124. *" 

Kulikowski, J. j. & Bishop, P. o. (1981) Linear analysis of 

the responses of simple cells in the cat visual cortex. 

Exp. Brain Res. 44 386-400. 

Mach, E. (1865) Uber die Wirkung der raumlichen Vertheilung 
des Lichreizes auf di Neztzhaut. I.S.-B. Akad. Wiss 

Wien, math 54 303-322. " 

Maffei, L. & Fiorentini, A. (1973) The visual cortex as a 
spatial frequency analyzer. Vision Res. 13 1255-1267. 

Marr, D. (1976) Early processing of visual information. Phil 
T ra n s . IL_ Soc. Lond. B275 485-526. 

Marr, D. (1982) Vision Freeman, San Fransisco. 

Marr, D. & Hildreth, E. (1980) Theory of edge detection. Proc 
R^_ S PC t Lond. B207 187-217. 

Morrone, M.C. & Burr, D.C. (1986) Evidence for the existence 

and development of visual inhibition in humans . Nature 
321 235-237. 

Morrone, M.C. & Burr, D.C. (1988) Feature detection in human 

vision: a phase dependent energy model. Proc. R Soc 
(Lond) B235 221-245. 




63 



Morrone, M.C., Burr, D.C. & Maffei, L. (1982) Functional 

significance of cross-orientational inhibition: part I 

Neurophysiology Proc . Rov. Soc . (London) B216 335-354. 

Morrone, M.C. , Burr, D.C. & Ross, J. (1983) Added noise 

restores recognition of coarse quantised images. Nature 
305 226-228. 

Morrone, M.C., Ross, J. , Burr, D.C. & Owens, R. (1986) Mach 
bands depend on spatial phase. Nature 250-253. 

Morrone, M.C. & Owens, R. (1987) Feature detection from local 
energy. Pattern Rec . Letters 1 103-113. 

Openheim, A.V. & Lim, J.S. (1981) The importance of phase in 
signals. Proc . IEEE 69 529-541. 

Pollen, D . A. & Ronner, S.F. (1981) Phase relationships between 
adjacent simple cells in the visual cortex. Science 212 
1409-141. 

Ross, J., Morrone, M.C. & Burr, D.C. (1989) The conditions for 
the appearance of Mach bands. Vision Res. 29 699-715. 

Spitzer, H. & Hochstein, S. (1985) A complex-cell receptive 
field model. Neurophvsiol . 53 1266-1286. 

Watt, R.J. & Morgan, M.J. (1985) A theory of the primitive 
spatial code in human vision. Vision Res . 25 1661-167. 

Yuille, A. L . & Poggio, T. (1985) Fingerprint theorems for 

zero-crossings. JL. opt . Soc . Am. A2 683-692. 




Visualizing and Understanding Patterns of Brain Architecture 



Alan Rojer 
Eric L. Schwartz 

Department of Computer Science 
Courant Institute of Mathematical Sciences 
7 15 Broadway 
New York, N.Y. 10003 

Computational Neuroscience Laboratories 
New York University School of Medicine 
Department of Psychiatry 
550 First Avenue 
New York, N.Y. 10016 

ABSTRACT 

We illustrate application of computer science to neuroscience at three levels: measuring, 
modeling, and understanding the computational function of the columnar pattern of ocular 
dominance in primate visual cortex. We review our methods for the quantitative reconstruc- 
tion of the pattern of binocular input to the visual cortex of monkeys. We show that an 
oriented bandpass filter, applied to white noise, provides a simple parametric characteriza- 
tion of the observed pattern. We suggest a computational motivation for the columnar 
architecture as a “brain data structure” for a stereo vision algorithm based on the properties 
of a nonlinear filter, the cepstrum. This work illustrates some of the algorithmic difficulties 
and novel research problems encountered when computational approaches are used to visu- 
alize the patterns of neural architecture of the primate visual system. 

Computational neuroscience: beyond rendering 

One way of defining the term “computational neuroscience” is as the set of prob- 
lem domains in which there is a nontrivial overlap of neuroscience and computer science 
(Schwartz 1990). “Non- trivial’ 5 is a loaded term. It was not so long ago that the use of 
computer data acquisition in microelectrode and EEG experimentation was sufficiently 
exotic to be considered a “computational” topic. But today, this type of application is 
routine. Real-time control of electrophysiological data acquisition is a standard skill; it 
has become a problem of neuroscience, and not of computational neuroscience. 

Applications of computer graphics to anatomical reconstruction have similarly 
become routine, at least at the level of displaying a simple 3D model based on either 
voxel or polygon solids models. Most vendors of high-end graphics equipment provide 
prepackaged routines for the display of brain data. Given a set of serial sections of brain, 
either in images (voxel data structure) or wire-frame (polygon data structure) form, it is 

Supported by AFOSR #F85-0235, #88-0275, the System Development Foundation and the Nathan S. Kline Psychiatric 
Research Center. 




66 



routine to construct a hidden surface rendering on a graphics display. Such renderings of 
brains have become a staple of educational television and popular science publications. 
But these reconstructions do not address the deeper scientific issues of meaningful 
modeling of neural structures and, more importantly, of understanding why they are 
important for brain function. 

We emphasize that the practice of computational neuroscience naturally divides into 
three levels: measurement and reconstruction, modeling, and computational significance. 
In the present paper, we will review recent work from our lab at each level. Specifically, 
we will apply computational methods to the elucidation of the structure and function of a 
particular architectural feature of the primate brain: the ocular dominance column (ODC) 
system, which comprises the first area of interaction of the projections from the left and 
right eyes to the visual cortex in primates. Since all the examples in this paper are drawn 
from this system, the diverse areas of expertise which are involved are somewhat unified. 
This work provides a fair sample of the scope of the research problems which character- 
ize computational neuroscience. 

1. Level 1: Measurement of the Macaque Ocular Dominance Column System 

Binocular stereo vision is based on the brains ability to detect and utilize small 
differences between a pair of projections of a single 3D scene, due to the displacement of 
the two eyes. In monkeys, information from separate eyes is not integrated until it 
reaches primary visual cortex (VI, also known as striate cortex and area 17) When the 
roughly 10 6 neural fibers (axons) from each eye enter VI, they segregate into a striking 
pattern of stripe-like domains, as shown in Figure 1. The ODC pattern is shared by many 
primate species, as well as other mammals (LeVay et al. 1975). 

There are more than 20 functionally distinct areas of visual cortex in primates. And 
all of them are folded, twisted, and stacked up in a very uncongenial fashion. Depending 
on the anatomical and functional criteria, there are some 5-10 distinct layers in each cort- 
ical area. In monkey cortex, which is about 1 mm thick, the central layer of the cortex 
where the pattern is most prominent is several hundred microns thick. Sections from a 
serially sliced brain are shown in Figure 2. The gap between the anatomical data of Fig- 
ure 2 and the large-scale pattern of Figure 1 illustrates the challenge in visualizing neural 
patterns. 

Computational neuroanatomy: problem specification 

The previous brief introduction presents the neuroanatomists dilemma. When a 
brain is cut in conventional serial sections (as in Figure 2), it is very difficult to visualize 
the kinds of patterns which exist in the tangential plane of the cortex. Patterns of 
interest are embedded in interior layers of cortex. Large sections of the cortex are 
obscured by the folds of sulci and gyri. The data is distributed in a series of cross- 
sections which obscure the tangential structure which is of interest. This leads us to for- 
mulate some essential problems of computational anatomy. Given a set of serial sections 
of cortex, we wish to: 




67 




Figure 1 A computer-flattened brain peel at the level of layer IV of visual cortex of a ma- 
caque monkey, stained to reveal the pattern of ocular dominance columns (dark: left eye; 

light: right eye). 

1. Construct a high-resolution solid model of the surface of the cortex. 

2. Peel the layers apart and render them in gray-scale. 

3. Construct joint voxel and polyhedral surface models. 

4. Unfold the polyhedral cortex into a planar model 

5. Texture map the 3D image pattern into 2D 

We omit from this discussion the difficult problem of producing the pattern of ana- 
tomical “stain” which reflects the neural pattern of interest. This is a problem of neuros- 
cience, not computational neuroscience. We will assume that we have obtained a 
“stain” which marks out the patterns of interest to us. 

Construction of a solid (voxel) model of cortex 

Given a set of raw serial sections, appropriately stained so that the ODC stripes are 
prominently marked (e.g. the left eye projection is dark and the right eye projection is 
light), the first problems encountered are digitizing, aligning, and conditioning the digi- 
tized sections. The serial sections comprise thin slices of brain tissue, stained and 
mounted on glass slides (see Figure 2). The sections are 40 ji thick, and there are about 
four hundred sections for VI. We digitize our sections using a CCD camera which is 




68 




Figure 2 On top are five serial sections, cut in the coronal plane of the visual cortex of the 
monkey used in Figure 1 above. Even though the cortex is a simply connected shell, the 
complexity of folding of the surface leads to multiply-connected components in section. The 
evolution of a saddle point (dark figure eight in section 153) which is internal to the main 
component of cortex can be observed. On the bottom are the corresponding contours pro- 
duced by intersecting a brain peel (see Figure 3) with a simulated cutting plane. These five 
contours correspond to the sections shown above, and lie roughly in layer IV of the cortex. 

mounted under a conventional photographic enlarger. The slides are small enough to be 
inserted directly into the film carrier, and appropriate optics minifies the image onto the 
surface of a CCD chip, which is then digitized in NTSC video format. The CCD chip is 
mounted on a rotation and translation stage, which can be manipulated with great preci- 
sion. 

To reconstruct the solid brain, sections cut from the face of a block of brain must be 
precisely realigned. A certain amount of distortion affects each section. In principle, 
such distortion may be fixed by generalized image warping of the sections. Fortunately, 
we have had satisfactory results from similarity transformations (size, shift and rotation). 
We have not been forced to include general affine or polynomial warps in order to recon- 
struct the samples of VI which we have processed, 

It is a common practice to insert pins through a brain to be sectioned, and then to 
align adjacent sections up to these pin marks. To avoid damaging the tissue, we instead 
photograph the frozen brain block as it is being cut. This provides a record of the physi- 
cal position of a serial section before it was cut, in the form of a “movie” of brain sec- 
tions. The frames of this movie are digitally toggled with the images of the sections in 
the digitizer. The sections are moved physically to achieve alignment. This method 
works extremely well, allowing us to align a full set of serial sections with an estimated 
RMS error of about 150ji (Schwartz and Merker 1986b). 

Once the sections are aligned and digitized, they undergo a series of conditioning 
steps. 






69 



1. Thresholding to isolate the gray matter of the cortex from the other brain 
structures in the digitized image. 

2. Histogram equalization to stretch contrast and normalize the range of gray 
scale. 

3. Image Filtering to repair damage due to blood vessels, microtome knife 
chatter, etc. 

4. Interactive Repair to be sure that no topological defects exist in the images. 
Figure 2 shows the final results of these processes. 

Peeling the Brain 

The steps so far described yield an implicit solid model of the brain in the form of 
several hundred images, each of a single serial section. Since these sections are aligned 
and conditioned, their union is a digital representation of the ODC pattern. But in this 
form the representation is of limited usefulness. As mentioned above, we need access to 
internal layers: we need to peel the brain. A microtome peels the brain by cutting it into 
thin sections, parallel to the knife blade. We wish to digitally cut the brain in curved sec- 
tions which are parallel to the tangent plane of the brain. We now describe two algo- 
rithms to produce brain peels. 

Intersection of a polyhedron with a voxel model 

This algorithm is relatively trivial. Given a polyhedral model of an interior layer of 
a solid, the polyhedron can be intersected with the 3D voxel model to obtain a voxel sur- 
face. The polyhedral model may be produced by hand tracing of contours on individual 
images of brain sections, followed by conventional (interactive) 3D triangulation. One 
advantage of this method is that it can produce a surface at a known (i.e physiological) 
location, subject only to the accuracy and knowledge of the hand tracing of contours. 
However, this method will only reliably produce a voxel surface within a few voxel 
thicknesses of the traced surface. As one moves further from this surface, the “peels” 
produced this way become unreliable, since they are following a relatively coarse 
approximation to the voxel surfaces. If one wishes to produce a full set of peels (e.g. 25 
peels at 40 micron thickness, spanning the full thickness of cortex), this method is inade- 
quate. It would require a great deal of laborious hand tracing and hand triangulation. 

3D surface tracking with a shield 

Excellent algorithms for 3D surface tracking are known (e.g. (Artzy et al. 1981). 
Surface tracking provides two surfaces: the inside and the outside of the 3D voxel model. 
This is problematic, because we wish to consider the inside and the outside of the voxel 
model as distinct surfaces, which is the case physiologically. It is thus necessary to con- 
struct a “shield” which will prevent a surface tracker from producing both the inside and 
outside surfaces of the solid shell as a single connected component. 

Successive peeling can also introduce topological problems when applied to com- 
plex surfaces. A given surface may be perforated, thus changing the connectivity of the 




70 



residual surface. Small connected components (“dirt”) may escape image conditioning 
or may be produced in the process of peeling. These details must be carefully dealt with, 
as the process of peeling is subject to extreme topological instability: a single voxel 
spuriously connecting two surfaces can change the qualitative nature of the entire peel 
being produced. 

Figure 3 shows a peel of a monkey brain produced by this algorithm incorporating 
the ODC pattern. Further details of our work in surface tracking and brain peeling may 
be found in (Frederick and Schwartz 1990). 

Joint Voxel and Polyhedral Models: Automatic Triangulation 

The brain peels which we produce contain all of the relevant data for ODC meas- 
urements. Unfortunately, the peel reflects all the irregularities of curvature and folding 
of the cortical surface. The problem of unfolding and flattening the cortical surface will 
be addressed in the next section; we note here that a later requirement for extensive 
numerical calculations (brain flattening) necessitates a more compact model of the peel 
than that afforded by voxels. We therefore require a conversion from the high- 
resolution, storage-intensive voxel representation of the peel to a piecewise linear 
(polyhedral) representation which captures the geometry of the peel with a much smaller 
storage burden. We thus need to triangulate the voxel surface. 

There have been many algorithms described in the literature for the purpose of con- 
structing polyhedral surface models, especially from serial sections. Early version of 
these algorithms (e.g. (Fuchs et al. 1977), could not deal with surfaces which had multi- 
ple contours per slice. Recently, several algorithms have been described (Anjyo et al. 
1987) Boissonnat 1988 ; which are promising for use with complex surfaces, but they 
rely on heuristics which might prove troublesome with highly complex surfaces, such as 
cortex. 

The difficulties of automatic triangulation can be traced to a single problem: when a 
three dimensional surface is represented by the union of its contours, there can be topo- 
logical changes in the connectivity of the contours which are caused by the existence of 
critical points in the surface. This problem is illustrated with a sectioned torus (Figure 
4). A torus in general position has four critical points, corresponding to places where the 
tangent plane to the torus is horizontal. These can be viewed locally as a maximum, a 
minimum, and two saddle points. The sectional representation of the torus undergoes 
topological changes when these critical points are traversed. 

As the the height of the sectioning plane increase, the first nontrivial section 
comprises a lone point (locally a minimum). Then the cross section becomes a simple 
loop. At the next critical point (a saddle), the section becomes a figure eight; after the 
saddle point there are two simple loops. The third critical point is a also a saddle (with a 
figure-eight section), after which the contours merge back into a single loop. The final 
critical point is the sole point in the last nontrivial section. 

The existence of critical points, and hence changes of topology between adjacent 
sections, provides a major challenge to triangulation algorithms. In order to deal with 




71 




Figure 3 A brain peel of monkey brain, stained to reveal the ocular dominance column pat- 
tern. This shell is about 40 thick, and corresponds roughly to layer IV of VI (visual cortex). 

Shown here is the majority of VI and some parts of V2 and other extrastriate visual areas: 
this peel represents the entire posterior occipital pole of the brain. 

this problem, we note that we have complete information regarding connectivity of vox- 
els in adjacent digital sections of a brain peel. Each slice through a peel yields contours 
which can then be linked to contours in adjacent slices using the voxel connectivity data. 
We can detect critical levels (sections containing critical points) by their attendant 
changes of topology. By following these contours between the topological transitions 
which mark the critical points, we are able to “parse” the brain surface into a series of 
generalized cylinders, which are composed of simple loops, and critical points, where 
generalized cylinders are created, merge or vanish. 

There are theorems in differential topology (Wallace 1968) which state that the crit- 
ical points of a differentiable surface 1 . are isolated and of only a few basic types. The 

1. Our surfaces are substantially over-sampled, since we use 40 voxels to represent brain sur- 
faces whose radius of Gaussian curvature is typically in the range of 1mm - 10mm. In our experi- 
ence, the assumption that our voxel surfaces represent discrete samples of a differentiable surface 
is reasonable. 



72 




Figure 4 A torus, and the location of its four critical points: a maximum, two saddle points, 

and a minimum. 

small number of possible types of critical points limits the kinds of topological changes 
which can occur between adjacent sections. Our strategy is to explicitly deal with each 
of the kinds of critical points that are possible (they are illustrated by the torus example), 
and thus be able to provide an argument for the correctness of the triangulation. 

These ideas were recently implemented (Schwartz et al. 1988) by parsing a 
sequence of sections with connectivity data (i.e. a brain peel) into a series of generalized 
cylinders joined by critical points. This program acts as a translator which then con- 
structs a program for a simpler triangulation algorithm which is capable of joining single 
loops. There are many such algorithms. The results of application to a peel of monkey 
brain are shown in Figure 5. This triangulation was accomplished without any human 
interaction. 

Removal of human interaction is an important advance. The brain peeler algo- 
rithms provide about 20 brain peels per cortical area. Interactive triangulation of all this 
data is expensive and error prone. It is very important that our polyhedral models not be 
subject to arbitrary heuristics in the triangulation stage: we need correct models, in order 
to perform an accurate unfolding of the brain. 

Flattening the brain 

We have so far dealt with the cortex in its natural geometric setting: it is a curved 
two-dimensional surface. If we do not unfold the brain, we cannot observe much of its 
surface, which is hidden within the sulci of the cortex. It is also more difficult to con- 
struct models of neural architecture on curved surfaces than on planar surfaces. We have 
shown in earlier work that the curvature of VI is not so large that we cannot represent it 




73 




Figure 5 A polyhedral model of the brain which was reconstructed from serial sections. 

The sections were passed through the brain peeler to create a set of contours. These con- 
tours were parsed into generalized cylinders and critical points, and then connected by writ- 
ing a program for a simple triangulator, using no human interaction. There are 1776 trian- 
gles in this model. 

with a planar model. The values of both mean and Gaussian curvature of VI (Schwartz 
and Merker 1986a), indicate that planar models of monkey cortex are acceptable. We 
therefore address the problem of flattening the brain. 

There are two aspects to our method of flattening the brain. First, we specify a vari- 
ational problem. Given a polyhedral model of the cortex, we generate a random set of 
points which are constrained to lie in the plane, one point for each node of the 3D model. 
We then set up two matrices of interpoint distances: one in the plane and one in the 
polyhedron. We define a least-squares goodness of fit between the two distance matrices. 
We shift the points in the plane, via a standard gradient descent scheme, to maximize the 
goodness of fit between the matrices. A good degree of fit indicates a good approxima- 
tion to an isometry: a map from the 3D polyhedron to a 2D point set which preserves 
interpoint distances. 

This algorithm, which is variously known as non-linear mapping or multi- 
dimensional scaling (Sammon 1969), works quite well with the type of data we are dis- 
cussing here. We have found (Schwartz et al 1989) that a mean local error of 5% is 
achievable by this process, and that subsequent randomization of the 2D point set leads 
to the same final configuration, indicating that the solution does not become trapped in 
local minima. The main difficulty in this scheme is to come up with the interpoint dis- 
tances inside the 3D polyhedron. One would think that the problem of finding discrete 




74 



geodesic distances on the surface of (a nonconvex) polyhedron would have been solved 
long ago. But only in the past several years has a polynomial time algorithm been 
described for this problem (Mitchell et al 1987). As far as we know, this very difficult 
algorithm has not yet been implemented. We have developed a simpler exponential time 
algorithm, implemented it, and found that for polyhedra with several thousand nodes, 
this algorithm has adequate performance on current work-station architectures. Our geo- 
desic distance algorithm is described in (Wolfson and Schwartz 1989). Figure 6 shows 
an example of a flattened wire-frame of VI. 

The flattening algorithm described above provides a planar model of a cortical area, 
such as that of VI, which is has metric properties close to that of the original 3D surface. 
Although this procedure is intricate, it provides high precision data to be used for model- 
ing of columnar and topographic architectures in cortex. Having produced both the 
flattened and 3D voxel and polyhedral models, the only remaining step is to texture map 
the gray-scale values from the 3D surface into the flattened model. 

Texture Mapping 

There is a one-to-one correspondence between the triangles making up the 3D brain 
surface, and the flattened 2D model; the shape of these triangles differs very slightly, as a 
result of the small (typically <5%) flattening error. In order to map the gray-scale values 
(i.e. the image of the ODC system in the 3D surface, as in Figure 3) into the planar 
model, we only have to perform an image warp of triangular element (from 3D) to tri- 
angular element (in 2D). This texture mapping takes the form of a bilinear warp, and 
results in a planar image of the surface of the cortical ODC pattern (Figure 1). We have 
thus obtained a flattened image of the ODC pattern in VI. Similar methods have been 
applied in our lab to the reconstruction of other features of cortical architecture (e.g. 
topographic mapping) and we believe that these methods are sufficiently general to be 
applied to a wide range of problems in computational neuroanatomy. 

2. Level 2: Parametric Modeling of the ODC Architecture 

The previous section described a system we have recently constructed whose pur- 
pose is to provide high resolution computer models of cortical architecture. A number of 
difficult algorithmic and implementation problems were encountered. Much support 
software for low level aspects of this work had to be constructed; the entire system 
represents many tens of thousands of lines of C code. However, the output of this system 
is merely a set of images. It is interesting to be able to inspect such images, but they 
have no meaning in and of themselves. We have stressed in this and other work that 
computer display is merely the first in several coordinated steps. In order to begin to 
understand a pattern, such as the ODC system, it is necessary to model it. This modeling 
effort is strongly constrained by the data provided in the first stage. As workers in com- 
puter graphics well know, images are unforgiving of hand-waving! 

The ODC pattern bears a strong resemblance to other natural patterns, including the 
stripes of a zebra, the pattern of sand dunes in a desert, the domains in a magnetic bubble 
system, and wind-driven ocean waves. Some of these phenomena have proven amenable 




75 




Figure 6 The flattened wire frame used to produce Figure 1. 

to modeling via two-dimensional bandpass-filtered noise (e.g. (Mastin et al. 1987)). We 
apply a similar approach below. 

This image-oriented approach contrasts with that of other workers, who have sought 
to construct explicit neural models of pattern generation (e.g. (Miller et al. 1989; Swin- 
dale 1980; Swindale 1982)). We avoid construction of a mechanistic model of the sys- 
tem, in favor of developing a simple image-oriented model which reproduces the 
observed patterns. Our approach has the following strong points: 

• We can obtain maximum insight into the nature of the pattern itself by 
abstracting its properties in a simple model. We focus on the pattern itself, 
rather than on poorly understood processes in the developing nervous system. 

• By avoiding the computationally expensive details of a mechanistic model, we 
can develop a rapid simulation tool which allows us to extensively explore the 
model. 

• By abstracting the nature of the pattern in the form of a simple generative 
model, we can publish a parameterized procedure for generating these patterns, 
which spares workers wishing to reproduce the pattern the need to implement a 
complex simulation. 

We have found that an extremely simple model is capable of reproducing the pro- 
perties of the monkey ODC (and the cat ODC, as well as other columnar systems); this 
model is based on the properties of band-pass filtered white noise. 




76 



Bandpass-Filtered Noise 

In two dimensions, bandpass filters may be classified according to whether or not 
they have a preferred orientation. When a filter has no preferred orientation, we say it is 
isotropic. In this case, the filter has an annular shape in the frequency domain (Figure 
7a); in the space domain we find a kernel which has a familiar center-surround structure 
(Figure 7b). The filter is characterized by two parameters, the center frequency and the 
width. Filtering and thresholding a noise image with an isotropic filter yields a faintly 
columnar structure (Figure 7c, d) which lacks the striking medium-range directional 
correlation of the observed ocular dominance column structure. 

When we turn to oriented (anisotropic) filters, we find that the frequency domain 
representation has humps centered on symmetric points in the frequency domain (Figure 
8a), corresponding to planar cosine waveforms of particular frequency and orientation. 
The space-domain kernel of the filter consists of alternating positive and negative regions 
in the direction perpendicular to the preferred orientation, with the regions elongated 
parallel to the preferred orientation (Figure 8b). Filtered noise assumes a wave-like pat- 
tern (Figure 8c); after thresholding, we obtain a figure (Figure 8d) with a strong resem- 
blance to the ocular dominance pattern at small scales. 

The anisotropic filter is slightly more complicated than the isotropic filter; we 
require three parameters for a complete description. Like the isotropic filter, the center 
frequency and width must be specified. In addition, there is an orientation. The effects 
of varying these parameters are easy to describe. Increasing the center frequency 
decreases the column width. Changing the orientation of the filter alters the direction of 
the columns. Increasing the width of the filter increases the “noisiness” of the columns, 
resulting in more variation of column width and orientation, and increased branching and 
column termination. 

Analysis of the Pattern of Ocular Dominance 

The small-scale regularity of the pattern of ocular dominance is obvious; casual 
observation reveals that in any small region of the image, the columns are roughly paral- 
lel with relatively constant width. This leads us to expect that small subimages will have 
strong spectral components with direction and frequency consistent with the local direc- 
tion and width of the columns. We observed exactly this phenomenon. We measured 
the orientation and frequency of the peaks in the normalized spectra of overlapping 
subimages, obtaining the variation of these parameters over the flattened brain region. 
Although there was considerable local variation of these parameters, we attribute most of 
the variation to poor resolution of the data. After smoothing the measured parameters, 
we concluded that the variation of column width (center frequency) was largely constant 
over the brain region, while the variation of orientation was smooth and consistent over 
large subregions, and probably correlated with gross brain anatomy. 




77 




Figure 7 Two dimensional demonstration of the qualitative characteristics of unoriented 
(isotropic), bandpass-filtered noise, shown as intensity plots, (a) A typical isotropic 
bandpass filter in the frequency domain (f c = 0.15, 8 = 0.10). (b) The (space domain) ker- 
nel of the filter shown in (a), (c) The convolution of the kernel in (b) with zero mean, unit 
standard deviation Gaussian noise, (d) The “columns” which result when the signal in (c) 
is thresholded. 

Synthesis of the Pattern of Ocular Dominance 

Although the spectra of the subregions which we analyzed were complicated, we 
found that we could easily obtain subjectively satisfying synthetic patterns. We con- 
structed for each subregion an anisotropic bandpass filter, with the filter center frequency 
and orientation set to reflect the characteristics of the subregion’s spectral peaks, which 
we obtained from the measurements described above. We somewhat arbitrarily fixed the 
width parameter to 2/3 of the center frequency, since this yielded satisfactory results. We 
applied this filter to Gaussian noise and thresholded the result; a typical example is 
shown in Figure 9. 

Although this method yielded a good model of the column pattern on a small scale, 
we needed further refinements to apply the technique to the entire pattern. We 





78 




Figure 8 Two dimensional demonstration of the qualitative characteristics of oriented (an- 
isotropic) bandpass-filtered noise, shown as intensity plots, (a) The frequency domain 
representation of an anisotropic bandpass filter with center frequency = 0.05, bandwidth = 
0.08. (b) The (space domain) kernel of the filter shown in (a), (c) The convolution of the 
kernel in (b) with zero mean, unit standard deviation Gaussian noise, (d) The “columns” 
which result when the signal in (c) is thresholded. A typical isotropic bandpass filter in the 
frequency domain (f c = 0.15, 8 = 0.10). (b) The (space domain) kernel of the filter shown 
in (a), (c) The convolution of the kernel in (b) with zero mean, unit standard deviation 
Gaussian noise, (d) The “columns” which result when the signal in (c) is thresholded. 



constructed a “seed” noise image in register with the flattened brain image. We moved 
a window over the two images, applying to the underlying noise seed the parametric filter 
that was obtained from the smoothed parameters measured on the corresponding subim- 
age in the actual ocular dominance pattern. As described above, this required the extrac- 
tion of two parameters (orientation and center frequency) from the subimage of the actual 
ocular dominance pattern and a smoothing operation on the parameters thus determined. 




79 




Figure 9 Examples of subimage analysis from the flattened brain, (a) A typical “good” su- 
bimage. (b) The spectrum of (a). From this spectrum, we derived filter parameters 
f c = 0.157, 0 = 84 deg. We also use 8 = 0.67 f c and 8 = 1. (c) An image synthesized by ap- 
plication of the filter parameters derived from (b) to Gaussian noise followed by threshold- 
ing. (d) The spectrum of the synthetic image before thresholding (i.e. the convolution of the 
derived filter with a Gaussian noise). 

We then extracted a subimage from the synthetic pattern, which was used to “tile” a 
synthetic pattern of ocular dominance corresponding to the entire measured pattern. 
Lastly, the synthetic pattern was smoothed with a median operation. The result is shown 
in Figure 10. 







80 




Figure 10 Comparison of actual and synthetic column data, (a) Actual column data from 
the flattened brain, (b) Synthetic columns generated with parametric filtering and blending. 



We can summarize the synthesis of the ocular dominance pattern: 

Algorithm ODC_synthesis 

For each position of the movable window 

extract the local column pattern from the flattened brain data 
determine the spectral parameters (center frequency and orientation) 
Smooth the measured parameters 
Generate a noise seed in register with the brain data 
For each position of the movable window 

extract the registered noise subimage from the noise seed 
construct a filter according to the smoothed spectral parameters 
filter and threshold the noise subimage 
extract the central region of the subimage 
Combine all the extracted regions in their proper positions 
(* these regions are non-overlapping and exhaustively cover the original *) 
Smooth the synthetic columns 



Synthetic Columns as a Model for Ocular Dominance 

The synthetic columns which we have constructed seem to successfully capture the 
qualitative characteristics of the actual pattern of ocular dominance. The model embo- 
dies the strong points which we have mentioned earlier. This system is extremely fast 
(we can generate a high-resolution model of the full primate ODC on a workstation in 
several minutes). It is a simple parameterization which other workers can easily use. 





81 



The ability to economically simulate columnar systems with high accuracy is of great 
importance to further modeling of the properties of visual cortex. 

We believe that the properties of filtered, thresholded noise underlie the mechan- 
isms of all of the related column systems (zebra stripes, magnetic bubbles, sand dunes 
and ocean waves). The kernels associated with the anisotropic bandpass filters are prob- 
ably an essential component of any realistic model of primate ocular dominance, whether 
they are present as oriented receptive fields or correlated input. An underlying noise 
source is also necessary to allow bifurcation of symmetric initial conditions into two sets 
of columns (symmetry breaking). The threshold operation captures the “positive feed- 
back with bounds” that is implicit in models which modify connectivity according to 
correlated activity yet do not permit unbounded growth of connection strengths. Thus 
our model highlights the essential properties that are required to generate the ocular dom- 
inance structure, without the need to specify low level developmental processes. 

3. Level 3: Computational Significance of the ODC Architecture 

In the preceding sections, we presented a detailed discussion of the measurement 
and parametric modeling of the ODC system. We have not yet considered the function 
of the system; what possible computational significance might there be to formatting 
binocular data in the form of the ODC system? 

We will review recent work (Yeshurun and Schwartz 1989) which describes a sim- 
ple nonlinear filter (the cepstrum) which can use a columnar data format to extract bino- 
cular disparity between paired images. In this work, we show that the column/cepstrum 
model has properties consistent with the limitations of stereo vision in humans. 

Stereo Segmentation and the ODC pattern 

The lateral offset of the two eyes introduces differences between the two images 
which are projected to the cortex. The positional and orientational shift of an image 
feature from one image to the other is the binocular disparity of the feature. Many algo- 
rithms to compute binocular disparity have been proposed over the past decade or so. 
Some algorithms attempt to locate salient features in each image, and then match them 
between images. These algorithms suffer from the ambiguity of multiple instances of the 
same feature in an image. Other algorithms systematically shift one image to try bring it 
into alignment with the other image (correlation). These algorithms are computationally 
expensive; when there are N pixels in an “eye” or image to be processed, direct correla- 
tion will involve computation of order of N 2 . 

In the ODC system, we observe that data from the two eyes is formatted in paired 
stripes. This suggests an algorithm utilizing interleaved data from the two eyes. The 
columnar layout can be characterized as a kind of “visual echo”: a small patch of the 
left eye view of the scene is echoed in the right eye view with a shift of one column 
width plus the binocular disparity we are trying to measure. A well known and highly 
effective algorithm for echo detection in auditory processing is the cepstral filter (Bogert 
et al. 1963). The cepstrum is the power spectrum of the log of the power spectrum of an 
signal. Because of the two power spectra, the units of the cepstrum revert to the domain 




82 



of the original signal. For an audio signal, this means the cepstrum has units of time. It 
can be shown that the cepstrum of an auditory signal with an echo will have a prominent 
peak at a position corresponding to the time delay of the echo. 

To explore this idea, we implemented a two-dimensional version of the cepstrum 
and applied it to pairs of images formatted as small paired patches. The relative scale of 
the images and patches were chosen to be consistent with normal viewing conditions, and 
the anatomical scale of the ODC system. In Figure 11a, a gray-scale image which sub- 
tended about 8 degrees of arc is shown. Each subimage shown in Figure lib is about one 
column width (5 minutes of arc). The data is shown as two copies of the same section of 
the large image, taken from the position of the arrow. The cepstrum (Figure 11c) applied 
to this pair gives a strong peak at a position corresponding to a shift of one column- 
width. In Figure 12, we demonstrate a stereo segmentation on a random dot stereogram 
of large size (3500x3500), which contains a pac-man figure as correlated binocular cues. 
We ran the cepstrum on small windows corresponding to square patches of the size of 
ocular dominance columns. This approach produced a rapid segmentation of this 
extremely large binocular stereo problem (Yeshurun and Schwartz 1989). 

From a computational point of view, the column/cepstrum algorithm has very good 
complexity: it is NlogN in the number of pixels. From a biological point of view, the 
algorithm can be implemented by a system of neurons which is capable of extracting 
power spectral estimates of the “cortical image”. But this is one of the basic features of 
cortical neurons: they are medium-band spatial filters, with bandwidths typically of 1.5 
octaves. In other work, we have shown that a two dimensional cepstrum can be effective 
for stereo segmentation when implemented either with a standard FFT algorithm (com- 
putational application) (Yeshurun and Schwartz 1989) or with medium bandwidth spatial 
filters (neural application) (Yeshurun and Schwartz 1988). 

There are a number of interesting consequences of the column/cepstrum algorithm, 
which are related to its columnar data format. Because of the windowed nature of this 
algorithm, it cannot analyze very rapid changes in disparity (i.e. varying over scales 
smaller than the column width). There will be natural scale factors in behavioral aspects 
of stereo vision which are set by the size of ocular dominance columns. We have dis- 
cussed these characteristic properties elsewhere (Yeshurun and Schwartz 1988) Yeshurun 
cepstrum ; where we show that there is a correspondence between these predicted algo- 
rithmic properties and the psychophysical details of human binocular stereo vision. 

In summary, we have shown that the ODC pattern of interlaced inputs from the left 
and right eyes has an intriguing relationship to the functional aspects of stereo vision, to 
which this system is almost certainly deeply related. From a computational point of 
view, cepstral stereo has very attractive performance aspects. Recently, Ballard and 
Brown have implemented the cepstral algorithm in hardware, and have used it to control 
the binocular vergence of a machine vision binocular camera system. 




83 




Figure 11 (a) A natural scene, which was photographed with a stereo camera technique. 

(b) The arrow points to a tiny portion of the scene, comprising about 5 minutes of arc, 
which is reproduced in a magnified form in Figure lib. (c) The cepstrum of B above, 
which represents a small window of a stereo frame, simulating the means by which this data 
would be presented in the brain via the ocular dominance column system, i.e. as a pair of 
small patchs of (left/right) image, with a small disparity component of shift added to the 
normal columnar offset. The bright peaks in the figure represent the magnitude of the bino- 
cular disparity (modulo the column size). 

Conclusion and Prospects 

This paper has developed several themes related to the visualization and conceptu- 
alization of patterns of functional architecture in the mammalian brain. In order to visu- 
alize a brain pattern, one must first deal with the extremely challenging problems associ- 
ated with reconstructing it from raw data. We described some half-dozen algorithms 
which we developed for this purpose. We illustrated their application to a single archi- 
tectural feature of primate visual cortex, the ocular dominance column system of VI. 
We showed that bandpass-filtered thresholded white noise provides an economical graph- 
ical model of this pattern. We outlined an algorithmic justification for the presence of 





84 





Figure 12 A segmentation of a simulated 8 by 8 degrees RDS (3500 by 3500 pixels) by the 
algorithm. The figure can be segmented only by binocular disparity cues. The original im- 
age consists of 3500 by 3500 pixels, that represent a 8 degree image. Taking windows of 
32x32 pixels each, the method produces a segmented image of 100 by 100 pixels, simulat- 
ing 5 minute windows over the entire image. 



columns in a visual system which computed binocular disparity; in conjunction with a 
simple nonlinear filter (the cepstrum), we obtained a fast and robust algorithm for stereo 
segmentation. 

Application to other neural systems 

In addition to the ODC system, there are many other examples of columnar archi- 
tecture in the brain. For example, neurons which respond to oriented edges of similar 
orientation are grouped together into “orientation columns” in VI and neurons which 
respond to a similar direction of movement are grouped together into “direction 
columns” in a cortical area specialized for motion (area MT). Columnar grouping has 
been observed in frontal cortex, and secondary visual cortex (V2) has at least three 





85 



(a) 




(b) 



(c) 




Figure 13 Application of the cepstrum to curvature computation, (a) Geometry of the 
model. A contour C passes through the receptive fields of two “hypercolumns”, shown 
here as boxes Cl and C2. The contour has approximate orientation 0i = 2.618 radians in 
Cl, and 02 = 2.094 radians in C2. The approximate curvature of the contour in the region of 
Cl and C2 is given by 60/85 , with 80 = 02-0i = -0.524 radian and 85 the arc length shown. 

(b) Representation of 0i and 02 as distributions of activity along adjacent hypercolumns, (c) 

Positive part of cepstrum of the signal in (b). Note the presence of a peak at 5.7596 radians, 
corresponding to 80+2 tl 

interlaced column systems. This list will undoubtedly expand with continued research. 
We feel that the presence of columns, that is, of groups of neurons in periodic patterns on 
a scale that is large compared to the size of single neurons, is one of the basic architec- 
tural schemes of neocortex. 

Many workers have speculated about the existence of these column systems: why 
are they so common? The results summarized in this paper suggest two answers: 
columns are easily constructed and they are computationally useful. We have shown that 
the generation of the columnar pattern can be accomplished by an almost trivial mechan- 
ism: bandpass filtering followed by thresholding. The space-domain dual of bandpass 
filtering is correlation. Our studies thus suggest that any mechanism which introduces 
local correlation of form dual to the bandpass filters which we have described will result 




86 



in column systems of the kind actually observed in the brain. We should not be terribly 
surprised to see columns: they are one of the simplest patterns to build! 

The ease of construction couples nicely with the functional utility of column sys- 
tems. When one wishes to extract the difference of two similar signals, the 
column/cepstrum architecture is ideal. Using the notion of generalized difference map- 
ping, we can suggest another application to processing of visual information. Sensitivity 
to visual “edges”, i.e. abrupt changes of contrast, is also formatted across the surface of 
cortex in a periodic fashion, in the form of “orientation columns.” VI comprises a 
periodic orientation map, as well as a periodic stereo map. Changes in orientation are 
proportional to curvature, and curvature is one of the principal shape descriptors which 
the visual system must analyze. In Figure 13, we briefly illustrate how the cepstral 
operator, when applied to orientation columns, yields a strong peak in a position approxi- 
mating to the local curvature of a contour. The important point is that the same 
column/cepstrum architecture used for computation of binocular disparity can be applied 
to the computation of contour curvature, illustrating the power of a generalized differ- 
ence map. This economy of architecture may be the prototype of a generic computa- 
tional module in neo-cortex. 

The orientation column pattern may be generated by thresholding isotropic 
bandpass-filtered noise. The issues involved in visualizing and measuring the orientation 
column system are similar to those outlined above for the ODC system. All of the 
methods and demonstrations described for the ODC system could be applied to the orien- 
tation column system, given appropriate raw data. The same may well hold true for the 
other column systems of the cortex. 

The principle theme of this work is that cortex is a highly patterned “machine”, 
whose purpose is to recognize patterns. Our goal has been to uncover general cortical 
mechanisms by application of the methods of computer graphics and numerical modeling 
to visualize those patterns which are the basis of our ability to visualize patterns. 

References 

Anjyo, K., Ochi, T., Usami, Y., and Kawashima, Y. (1987), A practical method of constructing surfaces 
in three-dimensional digitized space, The Visual Computer 3(1): 4-12. 

Artzy, E., Frieder, G., and Herman, G.T. (1981), The theory, design, implementation and evaluation of a 
three-dimensional surface detection algorithm, Computer Graphics and Image Processing 15: 1-24. 

Bogert, B.P., Healy, WJ.R., and Tukey, J.W. (1963), The frequency analysis of time series for echoes: 
cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking, Proc. Symp. Time Series Analysis , 
John Wiley and Sons, New York, pp. 209-243. 

Frederick, C., and Schwartz, E.L. (1990), The brain peeler: viewing the inside of a three dimensional 
shell, Visual Computer 6(1): 3749. 

Fuchs, H., Kedem, Z.M., and Uselton, S.P. (1977), Optimal surface reconstruction from planar contours, 
Communications of the ACM 20: 693-702. 




87 



LeVay, S., Hubei, D.H., and Wiesel, T.N. (1975), The pattern of ocular dominance dominance columns in 
macaque visual cortex revealed by a reduced silver stain, J. Comp. Neurol. 159: 559-576. 

Mastin, G.A., Watterberg, P.A., and Mareda, J.F. (1987), Fourier synthesis of ocean waves, IEEE 
Computer Graphics and Applications 7: 16-23 (March). 

Miller, K.D., Keller, J.B., and Stryker, M.P. (1989), Ocular dominance column development: analysis and 
simulation, Science 245: 605. 

Mitchell, J.S.B., Mount, D.M., and Papadimitriou, C.H. (1987), The discrete geodesic problem, SIAM J. 
Comput. 16: 647-668. 

Sammon, J.W. (1969), A nonlinear mapping for data-structure analysis, IEEE Trans, on Computers C-18: 
401-409. 

Schwartz, E.L., and Merker, B. (1986), Computer-aided neuroanatomy: differential geometry of cortical 
surfaces and an optimal flattening algorithm, IEEE Computer Graphics and Applications 6(2): 36-44 
(March). 

Schwartz, EJL., and Merker, B. (1986), Three-dimensional computer reconstruction of the ocular 
dominance column pattern of macaque striate cortex: a digital tangential microtome, Invest. Ophthal. and 
Vis. Sci. IT. 223. 

Schwartz, E.L., Merker, B., Wolfson, E., and Shaw, A. (1988), Computational neuroscience: Applications 
of computer graphics and image processing to two and three dimensional modeling of the functional 
architecture of visual cortex, IEEE Computer Graphics and Applications 8(4): 13-28 (July). 

Schwartz, El,., Shaw, A., and Wolfson, E. (1989), A numerical solution to the generalized mapmaker’s 
problem, IEEE Trans. Pattern Analysis and Machine Intelligence 11: 1005-1008. 

Schwartz, E.L., (1990), Introduction, in: Computational Neuroscience , E.L. Schwartz ed. Cambridge, 
MA: MIT Press. 

Swindale, N.V. (1980), A model for the formation of ocular dominance column stripes, Proc. Roy. Soc. 
Lond. B 208: 243-264. 

Swindale, N.V. (1982), A model for the formation of orientation columns, Proc. Roy. Soc. B. 215:211- 
230. 

Wallace, A.H. (1968), Differential Topology , New York: W. A. Benjamin. 

Wolfson, E., and Schwartz, E.L. (1989), Computing minimal distances on arbitrary polyhedral surfaces, 
IEEE Trans, on Pattern Analysis and Machine Intelligence 11: 1001-1005. 

Yeshurun, Y., and Schwartz, E.L., (1988), Ocular dominance columns in macaque VI: a two dimensional 
stereomap, in: Computational Neuroscience , E. Schwartz ed. Cambridge MA: MIT Press. 

Yeshurun, Y., and Schwartz, E.L. (1989), Cepstral filtering on a columnar image architecture: a fast 
algorithm for binocular stereo segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence 11(7): 
759-767. 




DYNAMIC VISION 

Harry Wechsler* and Lee Zimmerman** 



♦Computer Science 
George Mason University 
Fairfax, VA 22030 
USA 



♦♦Electrical Engineering 
University of Minnesota 
Minneapolis, MN 55455 
USA 



1) Introduction 



The fundamental problem faced by all vision systems is the ambiguity 
created by the projection process. An object’s projected shape in an image changes 
dramatically for small changes in the observer's viewpoint. This is the basic 
difficulty in creating a machine vision system that can respond robustly in an 
unconstrained 3D environment. Our approach to this problem enables the vision 
system to actively engage its interpretation of the surroundings using a distributed 
memory system modeled as visual potentials and made up of characteristic 
view(point)s (Koenderink and van Doom, 1979). 

The suggested vision system consists of a feedback loop. Recognition affects 
depth interpretation and depth interpretation affects recognition. Input to our 
system is a 2D image and a sparse depth map (a viewer-centered map of the 

distances to visible surface points). A sparse depth map can be obtained from 
locality systems, which are modular units that use cues as stereo, shading, or motion 
to make estimates of depth (the distance from the observer to each point in the 
image). The output from the system is a dense map and the corresponding 
classification of the visible surfaces. Our system is a dynamic 3D image 

interpretation system in which location and recognition work hand in hand to 
produce solid and stable results. 

The vision system consists of four main components, which are graphically 
displayed in Fig. 1. It begins with the formation of the depthmap. The depth map 
formation system blends information from visual distance cues such as motion or 
stereo, and recognition coefficients from the memory, appropriately 'filtered' 

through the active perception system, to create a smooth dense map. The depth 
plane is the result of the relaxation of a thin plate (Terzopoulos et al, 1987) across 
high confidence distances. The intensity image and the depth map are used by the 
reprojection system to produce a distorted image - a flattened, frontal version of all 
visible surfaces. This flattened image is then analyzed by the memory system. The 

distributed associative memory (DAM) system used is similar to the rotation and 
scale invariant memory system described by Wechsler and Zimmerman (1988). High 
confidence from the recognition system for an area of the image will solidify the 
depth interpretation in that area. Characteristic views (aspects) are stacked 
together within the (DAM) memory and the dynamics of the visual system engage 
in active exploration to seek that view needed to enhance the interpretation 
process. The volume of space being examined by the reprojection system, the scale 




90 



of the features, and the stability of the depth map are under the guidance of the 
active perception (Bajcsy, 1988) system. The way in which the active vision system 
will alter the interpretation process is by choosing new viewpoints to be examined. 



2) DEPTH MAP FORMATION 



The depth map formation system uses a relaxation process to blend location 
and recognition information. Often this information is sparse or imprecise across 
the field of view, so it is necessary to employ relaxation in order to interpolate and 
smooth. The depth map formation system consists of three planes — confidence, 
distance, and thin plate or membrane, which are in registration with the input 
image. The thin membrane is the dense depth map composed from the anchor 
points, held in the distance plane, and initialized by the locality systems. 

The confidence plane receives information from the memory (filtered 
through the active perception system) and the locality systems. The locality systems 
produce sparse distance information, irregularly spaced across the field of view. 

Each point in the confidence plane indicates the confidence in the perceived 
distance value. When confidence is high the thin membrane is frozen and not 

allowed to change. When confidence is low the thin membrane can change a great 

deal. The distance plane contains the anchor points used by the relaxation process. 
High confidence from the recognition system for an area of the image will solidify 
the depth interpretation in that area. The depth map is formed from sparse depth 
information in the following way. First, the data is interpolated using a thin plate. 
The energy function used to enforce the constraints among the grid points is 

similar to that used to implement snakes (Kass et al, 1987). The energy function is 
composed of two terms. One enforces the constraint that the derived depth map be 
close to the actual range data; the other tries to find the flattest membrane that will 
fit the data. The stiffness of the plate is slowly reduced resulting in a membrane 
interpolation that captures the surfaces indicated by the data. This is done using 
successive relaxation steps. 



3) REPROJECTION 



When an object rotates about an axis perpendicular to the image plane, 
its projected shape will rotate but will not change shape. On the other hand, if 
the same object rotates about an axis parallel to the image plane (rotation in 
depth), its projected shape will change dramatically due to the appearance of 
new surfaces, forshortening, and perspective distortions. Forshortening 
occurs when the surface being viewed slants away from the viewer. 

Perspective projection occurs in all imaging devices and results in parallel 
lines on the surface becoming converging lines in the image plane. Our 
computer vision system tries to minimize this problem by 'reprojecting' the 
surfaces being viewed to a flat, frontal position. The reprojection system must 
satisfy several criteria. First, the reprojection function should compensate for 
the distortion of a single disconnected planar surface. This requires that 

surface markings, such as writing, be invariant to the slant of the surface. 
Second, the reprojection function should topologically map an object in an 
image to a characteristic shape. Third, the reprojection function should have 




91 



the ability to zoom in on an area of the image — limiting the spatial extent of 
the processing. Following the above criteria, the whole system should be able 
not only to recognize an object from its characteristic surface shapes but 

should also be able to recognize surface markings. 

The reprojection function assumes each pixel is an area of intensity at a 
distance given by the dense depth map (thin plate). The area of the pixel is 
expanded first horizontally and then vertically from the center axis outward. 
Each pixel is expanded according to the relative distance between itself and its 
neighbor along the direction of expansion. The reprojection will correctly 
compensate for the distortion of a single planar surface. It is non - linear and 

memoryless. Changes occurring to either the image or the depth map are 
immediately incorporated into the reprojected image. 

Let the input information to the reprojection system be from a single 
plane slanted in space about the x - axis. The slanted plane will be vertically 

forshortened in the image. If the plane were frontal, the vertical distance of 

the object's surface covered by the center two adjacent pixels is given by 

h = d / f , where h is the frontal vertical distance, / is the focal length, and d 
is the distance from the lens to the surface of the plane. The distance along the 
slanted plane covered by the center pixels is 

v = d / [fcos(0) - sin(0)] 

where 0 is the slant angle of the plane. The vertical distance v is larger than h 
for positive 0, smaller than h for negative 0, and equal to h for 0 = 90° . To 
rotate the plane to a frontal position in the projection plane, the image needs 

to be expanded or contracted by the ratio v / h. For a surface of arbitrary slant 
and tilt, the image is expanded in the vertical direction and then expanded in 
the horizontal direction using gradient space (p, q) information. 

The reprojection function is a homolographic projection of the visible 
surfaces in the image. Homolographic projections result in mappings where 
visible object surfaces maintain their relative 3D size. When this projection is 
performed on a world globe, the result would be equivalent to Lambert's Equal 

Area Map first developed in 1722. This mapping is still commonly used for 
polar regions. 

Curved surfaces could be treated as multi - faceted planar surfaces. All 
visible surfaces can be approximated to within a finite error by planar 
surfaces if the size of the . planar surface is small enough. Essentially, the 
reprojection function 'sees' each pixel as a planar frontal surface in 3D space, 
which has been projected to form the image being analyzed. The fundamental 

assumption for our computer vision system is that a 3D object is represented as 
a collection of 2D reprojections of the object from widely varying viewpoints. 



4) MEMORY 



The particular form of memory we deal here, related to classical 
conditioning or Hebbian learning, is of the DAM (Distributed Associative 
Memory ) (Kohonen, 1987) type. DAMs are example of NN (Neural Networks) 
(Anderson and Rosenfeld, 1988) and/or PDP (Parallel and Distributed 
Processing) (McClelland and Rumelhart, 1986) models. Stimulus and response 
vectors are associated, and the result of association is spread over the entire 




92 



memory space. Parallel and distributed computation means that information 
about a small part of the association can be found and processed over a large 
area of the memory. New associations can be placed over the older ones and 
interact. The size of the DAM stays the same regardless of the number of 
associations that have been memorized. 

The above discussion illuminates several properties of DAM which are 

different from the more traditional ones about memory. The associations are 
allowed to interact with each other and an implicit representation of 
structural and contextual information can develop. Consequently, a very rich 
level of relationships can be captured. There are few restrictions on what 
vectors can be associated. Consequently, extensive indexing and cross - 
referencing can develop. Furthermore, since the information is distributed 

across the memory, the overall function of the recognition system becomes 
robust to both memory faults and degraded stimulus vectors. The DAM 

operation includes memory construction and recall, and they are discussed 

next. 

The construction stage assumes that there are n pairs of m - 
dimensional vectors that have to be associated. This can be written as 

Msi = rj for i = 1, 2, n 

where Si and T[ denote the i th stimulus and response vectors, respectively. One 
seeks the memory matrix M such that when the k th stimulus vector Sk is 
projected onto the space defined by M the resulting projection will be the 
response vector rk . Specifically, we have to solve 



MS = R 



where S and R are the corresponding stimulus and response matrices, 
respectively. A unique solution for the above equation does not necessarily 
exist for any arbitrary group of associations. The number of associations n is 
usually much less than m , the dimension of the vectors. Consequently, the 
system of equations is underconstrained. The constraint used to seek for a 
unique matrix M is that of minimizing the norm I I MS - R I I 2 which yields 



M = RS + 



where S + is known as the Moore - Penrose generalized inverse of S (Kohonen, 
1987). 

The recall operation projects an unknown vector s onto the memory 
space M . The projection yields the response vector r , 



r = Ms = R(S + s) 



If the memorized stimulus vectors are independent, and the unknown stimulus 
vector s is one of the memorized vectors sk , then the recalled vector will be 
the associated response vector rk . If the memorized stimulus vectors are 
dependent, then the vector recalled by one of the memorized stimulus vectors 
will contain the associated response vector, and some cross - talk from the 
other stored response vectors. The recall can be viewed as the weighted sum of 
the response vectors. The recall, using a linear squares classifier, begins by 
assigning weights according to how well the unknown stimulus vector 




93 



matches with the memorized stimulus vector. The response vectors are 
multiplied by the corresponding weights (given as S + s) and summed together 
to build the recall. The recall is usually dominated by the memorized response 
vector that is closest to the unknown vector. The DAM provides for 
interactions between the stored associations and thus allows for 'some' 
generalization, useful when facing stimuli the system has not been trained 
with beforehand. 

The DAM is augmented using conformal mapping (Wechsler and 
Zimmerman, 1988) to provide for invariance to rotation and scale changes and 
the whole memory structure is shown in Fig. 2. Assume that pixels in the 
Cartesian plane are given as (x,y) = [Re(z), Im(z)] , where 
z = x + jy. One can then write z = r[exp(j0)] , where r = | z | = (x 2 + y2) 1/2 f 
0 = arg(z) = arctan (y/x) . The conformal mapping is then the mapping of 

points i onto points w and it is defined as 

w = ln(z) = ln[r(exp(j0))] = ln(r) + j0 

Points in the target domain are given by [ln(r), 0] = [Re(w), Im(w)]. 
Logarithmically spaced concentric rings and radials of uniform angular 

spacing are mapped into uniformly spaced straight lines. (See Fig. 3.) If the 
scale and rotation factors are k and <|> , respectively, then z old and z new are 

given by r[exp(j0)] and (kr)exp[j(0 + <|> ), respectively. After conformal 
mapping, rotation and scaling about the origin become linear shifts in the 
0(mod 2 n) and ln(r) directions, respectively. The association that the DAM 
builds up is that between the magnitude and phase of the Fourier transform. 
The stimulus to DAM is the magnitude, and is invariant to both rotation and 

scale changes because those correspond to mere translation after conformal 

mapping. The response (and then recall) is the phase. The difference between 
the recorded and recalled phase indicates the amount of rotation and/or scale 
change. 

Recognition takes place when processed information is matched with 
memory. When the correlation is high, both recognition and location of the 
object will occur. Our system assumes the matching, and thus the storage of 

models is done in terms of 2D views. The structure of the mapping implied by 

the proposed model is viewer - centered with the relative depth information 
placed implicitly within the stored model. As more dynamics are added to the 

system, these views will be dynamically 'tied' together to form the whole 

object, and will amount to viewer - oriented rather than viewer - centered 
representations. 

The input to the distributed associative memory are the flattened 
characteristic views of the object. A characteristic view of a polygon 

corresponds to those point of view that have a specified number of planar 
surfaces present in the projected image. For example, a cube will have three 

characteristic views -- one side present, two sides present, and three sides 
present. The system is designed such that it will recognize the object and 
recognize the pattern printed on the object's surface. 

The output from the memory is a classification vector, which measures 
how well the input view matches the stored views. It is used by the depth map 
formation system, appropriately 'filtered' by the active perception system, to 

adjust (raise or lower) the confidence plane. 




94 



5) ACTIVE PERCEPTION 



The ultimate goal of our vision system is to interpret the scene 
regarding what is there and where things are. This requires a definite 
interaction between location and recognition. The best way to examine the 
system is to separate it into automatic and active parts. The automatic parts are 
functional and include the construction of the depth map, reprojection of the 

image, and recognition by the memory. The active components control the 
search process by specifying the volume of space to be analyzed by the 
reprojection system, choosing new fixation points, and determining the 
balance between reliance on cues from the locality systems and the need to 

smooth the thin membrane. The active perception system accomplishes this by 
changing the parameters of other systems using (memory) recognition 
information. 

The volume of space under inspection will change within th parameters 
of attention point, depth point, and depth gradient. The attention point is the 
3D point along the reprojected image’s line of sight. The depth point is the 

point along the line of sight where the expansion or contraction of the 

reprojection function stops. The depth gradient is the maximum gradient 
marking where the expansion or contraction of the reprojection function 
stops as well. All three of them are actively determined. Notice that the volume 
of space being processed at any moment is determined by these parameters 
and by the environment under inspection, and it corresponds to the attention 
element of active perception. 

The response from the memory is used to modify the shape of the depth 
map. The time course and extent of this modification is determined by the 
active perception system. For example, if the locality systems were sending 
information that the object under scrutiny is flat, but the memory system's 
best guess is that the object should be curved in a specific way, then after a 
certain amount of time the active perception would release the distance 
restrictions enforced by the locality systems. This is done by lowering the 
confidence in those distance estimates given by the locality systems. The 
reduction in confidence increases the importance of the recognition system 
allowing the depth map to become appropriately curved. The ability to 
override the different cues to depth is critically important both because the 
cues may be invalid for a variety of reasons and because the task may require 
it. One such task would be to recognize the person’s face from a photograph. 
Stereo and motion cues could indicate the photograph is a flat surface with 
blotches of color on it, while the recognition system wants to recognize the 3D 
face that the photograph represents. 

Finally, the active vision system will alter the interpretation process, 
choosing new viewpoints to be examined. If the recognition system cannot 
converge, then the system should physically change its position in space, and 
reinterpret the surroundings. Choosing new viewpoints corresponds to the 
exploratory aspect of active perception. In our 3D recognition system, the idea 
of active vision is not only a process of motor control, but is a process for 
engaging the environment in search for interpretations of the surrounding 
environment. 

The experimental system has operated successfully on a database of 
seven viewpoints. The database used to create the DAM consists of one view of a 
cube, two views of a pyramid, one view of a plane, and three views of a 
tetrahedron. The recognition histograms, i.e., S + s, have indicated that the 




95 



system can correctly identify a novel input s, where the novelty is with 
respect to that input’s attitude in 3D space. 



6) CONCLUSIONS 



The methodology suggested herein for 3D object recognition includes 
explicit visual dynamics to actively engage the environment and object- 
centered representations. The approach thus combines object and viewer- 
oriented representations trough active perception. Our system, in effect 
balances object recognition with an understanding of the visible space. 

The memory, of DAM type, can be organized in terms of visual 
potentials, where the characteristic views are 2D (intrinsic) views and the 
edges connecting them are related to exploratory events characteristic of 
active perception. Geometric distortions are accounted for by distributing 
responsibility between low- and high-level vision. 



Note 

A longer version of this paper, by the same authors and entitled ‘Active perception and 3D 
object recognition’, appears in Active Perception and Robot Vision, edited by A.K. Sood 
and H. Wechsler, NATO ASI Series F, Vol. 83, Springer- Verlag 1992. 



References 



1. Anderson, J.A. and E. Rosenfeld (Eds.): Neurocomputing , MIT Press 1988. 

2. Bajcsy, R.: Active perception, Proc. IEEE , 76(8), 996 - 1005 (1988) 

3. Kass, M., A. Witkin, and D. Terzopoulos: Snakes: Active contour models, Int. 
Conf. on Computer Vision , London, England, 259-268 (1987) 

4. Koenderink, J. and A. van Doom: The internal representation of solid shape 
with respect to vision, BiolCyb , 32, 211-216 (1979) 

5. Kohonen, T.: Self-Organization and Associative-Memories (2nd. ed.), 

Springer-Verlag 1987. 

6. McClelland, J., D. Rumelhart and the PDP Research Group (Eds.): Parallel 

Distributed Processing , MIT Press 1986. 

7. Terzopoulos, D., A. Witkin, and M. Kass: Symmetry-seeking models for 3D 
object reconstruction, Int. Conf. on Computer Vision , London, England, 269- 
276 (1987) 

8. Wechsler, H., and L. Zimmerman (1988), 2-D Invariant Object Recognition 
Using Distributed Associative Memory, IEEE Trans, on Pattern Analysis and 
Machine Intelligence, Vol.10, No. 6, 811-821 (1988) 




96 




Object Location 
and Orientation 



Figure 1. Block Diagram of the Wechsler and Zimmerman 30 Object Recognition System 



97 





98 



90 9 





Figure 3. Conformal Mapping of the Cartesian Half-Plane 





A Model of the Acquisition of Object 
Representations in Human 3D Visual 
Recognition 



S. Edelman, D. Weinshall, H. H. Biilthoff, T. Poggio 



Center for Biological Information Processing, 
MIT E25-201, Cambridge MA 02139, USA 



1 Motivation 

A common approach to the study of visual recognition postulates that there exist in the 
visual system representations of familiar objects and scenes. To recognize an object, the 
system compares it with each of the stored models. Such a comparison would appear pos- 
sible only after the input image and the stored representations are brought to a common 
form. Consequently, the nature of representation must be reflected in the performance of 
the system [7]. 

One possibility is that the visual system stores a few representative (canonical) views of 
each known object, along with the information that permits it to normalize the appearance 
of an input object by computing how it would look like from a canonical viewpoint [9]. 
Palmer, Rosch and Chase [10] found that canonical views of commonplace objects can be 
reliably characterized using several criteria. For example, when asked to form a mental 
image of an object, people usually imagine it as seen from a canonical perspective. In 
recognition, canonical views are identified more quickly than others, with response times 
decreasing monotonically with canonicality (as defined, e.g., by subjective ratings). 

This dependency of response time on the distance to a canonical view is expected 
if one draws an analogy between recognition by viewpoint normalization on one hand 
[6, 18] and mental rotation on the other hand [15, 14]. The very existence of canonical 
views may then be attributed to a tradeoff between the amount of memory invested in 
storing object representations and the amount of time that must be spent in viewpoint 




100 



normalization. Remembering a frequently encountered view of an object may lead to its 
faster recognition in subsequent encounters. 

By the same argument, no preferred perspective should exist for familiar objects that 
are equally likely to be seen from any viewpoint. Indeed, there is evidence that nor- 
malization effects on recognition latency (as reflected in the existence of preferred views) 
disappear with practice for a variety of 2D stimuli such as line drawings of common objects 
[3], random polygons [5], pseudo-characters [4] and stick figures [17]. 

The aim of the present work is to model some phenomena related to canonical views 
and viewpoint normalization in object recognition. The work is based on psychophysical 
experiments that employed methods differing in several respects from previous studies. 
First, our stimuli were images of novel 3D objects with controlled complexity. This 
facilitated the study of the effects of object complexity and familiarity on recognition. 
Second, the stimuli appeared in various 3D orientations, bringing the experimental viewing 
conditions closer to those of real-world vision. Third, our task did not involve a handedness 
decision (such as whether the displayed object was a mirror image of the target), avoiding 
at least one source of criticism in interpreting the results. Fourth, subjects were not 
required to name the stimuli. This reduced the number of different cognitive modules 
required for solving the task, bringing the reaction time closer to the actual duration of 
recognition. 

The rest of the paper is organized as follows. The next two sections describe the 
psychophysical experiments and interpret the results. Sections 4 and 5 outline a simple 
model of human performance in the experiments and compare simulation results with 
psychophysical data. Finally, section 6 is a short summary of our conclusions. 

2 Experiments 

Define the viewpoint coordinates of an observer with respect to an object, 0 and <j > , as the 
longitude and the latitude of the eye (or the camera) on an imaginary sphere centered at 
the object. One would expect a function R(0,<j>) measuring the ease of recognition for a 
3D object possess one or more peaks, corresponding to its canonical views. We assessed 
the dependency of R on the object’s complexity and on its familiarity to the subject, 
using a two- alternative forced- choice reaction time paradigm. 




101 




Figure 1: Examples of wire-like objects. Shaded, grey-scale images of similar wires were 
used as stimuli in the experiments. 

2.1 Stimuli 

We used the Symbolics S- Geometry™ 3D graphics package to generate novel wire- 
like objects of small, nonzero thickness (Figure 1). This permitted us to simulate surface 
shading, while minimizing object self- occlusion. The objects were created in two steps. 
First, a straight five-segment chain of vertices was made. Second, each vertex was dis- 
placed in 3D by a random amount, distributed normally around zero. The variance of 
the displacements determined the complexity of the resulting wire. Third, the size of the 
resulting object was scaled, so that all the wires were of the same length. 

2.2 Method 

Thirty novel 3D objects, generated according to the procedure described above and 
grouped by average complexity into three sets of ten, served as stimuli in the experi- 
ment. 144 evenly spaced images of each of the objects were produced by stepping the 
camera by 30° increments in latitude and longitude. The images were rendered with the 
Symbolics S-Render™ program, using the Lambertian surface reflectance model, with a 
point light source of intensity 1.0 (located at the camera) and an ambient light source of 
intensity 0.3. During the experiments, the images were displayed on an CRT monitor, on 
a dark background, under subdued ambient illumination. The images subtended an angle 
of approximately 6° at a distance of 120 cm. 

The basic experimental run used ten objects of the same complexity as stimuli and 
consisted of ten blocks, in each of which a different object was defined as the target for 
recognition. Each block had two phases: 



102 



Training: In the beginning of each block, the subject was shown all 144 views of the 
target twice, in a natural succession. The target was perceived as tumbling in space, 
with the kinetic depth effect contributing to the three-dimensional appearance of 
the object. 

Testing: In the rest of the block, the subject was presented with a sequence of stimuli, 
shown one at a time. Half of these were familiar views of the target. The other half 
were views of the rest of the objects from the current set. For each object, a subset 
of 16 views (spaced by 90° in latitude and longitude) was used in the test phase. 
Each of the 16 views of the target appeared during the test phase five times. To 
facilitate later analysis, the first three and the last two appearances of each view 
were labeled, respectively, “session 1” and “session 2”. 

The appearance of a stimulus was preceded by a fixation point. The stimuli stayed on 
until the subject responded. The response times were measured in a two- alternative forced 
choice paradigm. The subject had to press one key if the displayed object was the current 
target, and another key otherwise. No feedback was given as to the correctness of the 
response. 

Three subjects (the first three authors) participated in the experiment. 1 The basic 
experiment has been repeated three times (once per complexity group) over a period of a 
few days. Altogether, 14400 responses were obtained. 

To assess the strength of the session effect over a wider range of familiarity, one of the 
subjects was tested in an additional, identical experiment. This subject saw every view 
of each target object 10 times, as compared to 5 times for the other two subjects. The 
results of the two experiments appear below. 

2.3 Results 

In the following analysis we used only the data from those observations in which the 
stimulus shown was actually the target (as opposed to one of the distractors) 2 . Latencies 
of correct responses (response times or RTs) and error rates (ERs) were averaged to yield 
a single value per session per view per object. RTs longer than 3 sec or shorter than 
250 msec were discarded. No evidence of time/accuracy tradeoff was found. 

1 The findings reported here have been since replicated with other, naive, subjects [1], 

2 The reasoning behind this decision, as well as other details of the experiments, can be found in [1], 





103 



The decrease of the mean RT with practice was a basic effect that we had expected to 
find. This effect would have masked any differential effects of familiarity on the recognition 
of objects from different viewpoints, unless a measure of canonicality (the advantage of 
some views over others) insensitive to the overall decrease in mean RT were used. We 
chose the coefficient of variation of RT over the different views (defined as the ratio of 
the standard deviation of RT to the mean of RT) as one measure of the strength of the 
canonicality effect, and used analysis of variance to find its dependency on familiarity. 

A different way to assess the canonical views effect is by looking for an explicit depen- 
dency of the RT on the attitude of the object relative to the observer. In this case data 
cannot be pooled over different objects, unless a common reference attitude is defined. 
One possibility is to define the (subject-specific) best view for each object as the view 
with the shortest RT. One could then characterize RT as a function of object attitude by 
measuring its dependency on D = D(subject, target, view), the distance between the best 
view and the actually shown view. We used regression analysis to characterize RT (D) 
and ER(D). 

The rest of this section summarizes the experimental results (see [1] for details), which 
are further discussed in section 3. 

2.4 Experiment 1: two sessions of 3 and 2 exposures per view 

2.4.1 Analysis of response times and error rates 

Although the raw results exhibited considerable variation of mean RT and mean ER across 
subjects, the effects pertinent to the canonical views and mental rotation phenomena were 
stable and uniform (see below). Mean RTs were 0.75, 0.69 and 0.62 sec for low, middle 
and high complexities. Grouped by session, the RTs were 0.71 and 0.66 sec for sessions 1 
and 2, respectively 3 . The only significant interaction was that of complexity x subject. 
The mean ER for low complexity was 17.9%, for high complexity - 12.0%, and for middle 
- 9.7% (the last difference was not significant). The mean ER in session 2, 15.2%, was 
higher than in session 1, 11.2%. 

3 All differences among the means reported here and below were found significant by Duncan’s multiple- 
range test at p < 0.05, unless otherwise noted. 




104 




Figure 2: Experiment 1: coefficient of variation of RT (%) over views for the two sessions, 
by subject and complexity (square, triangle and dot stand for DW, HHB and SYE). The 
decrease of the c.v. of RT with session is significant. 






Figure 3: Experiment 1: coefficient of variation of ER (%) over views for the two sessions, 
by subject and complexity (square, triangle and dot stand for DW, HHB and SYE). The 
effect of session is significant, mainly due to DW’s contribution. 



2.4.2 Analysis of the coefficient of variation of response time and error rate 

The coefficient of variation of RT over different views of objects decreased with practice 
(see Figure 2). Effects of subject and session, but not of complexity, were significant. All 
three means by complexity were close to 26%. The means by session were 29.1% and 
23.8% for sessions 1 and 2. 

For ER (see Figure 3), all main effects were significant. The means of the coefficient of 
variation of ER by complexity were 156%, 186% and 206% for low, high and middle sets, 
respectively (the last difference was not significant). The means by session were 168% 
and 198% for sessions 1 and 2. 




105 




Figure 4: Regression curves of RT on D for the two sessions of experiment 1. Means and 
standard errors of over 1000 points are shown. RT is measured in .sec, D - in multiples 
of 30°. D = 0 corresponds to the best view. The lower curve refers to session 2. 

2.4.3 Regression analysis of RT, ER 

Regression analysis yielded a significant quadratic component. The dependency of R,T on 
D and D 2 for session 1 was RT = 0.576 -f 0.0957) — 0.01 3D 2 . It remained significant for 
session 2: RT = 0.558 + 0.07679 - 0.010D 2 . 

The regression of RT on the distance to a random view (fixed for each object and 
subject), computed as a control, was not significant. Notably, the regression of ER on D 
and D 2 was also not significant, either for session 1, or for session 2. 

The shapes of the regression curves of RT for the two sessions of experiment 1 seem 
to be different (see Figure 4). A multivariate test of the difference between the two sets 
of regression coefficients 4 came short, however, of confirming this impression. This was 
the main reason for carrying out experiment 2. 

2.5 Experiment 2: two sessions of 5 exposures per view each 

In this experiment, one of the original subjects (SYE) was tested repeatedly, to elucidate 
the dependency of regression results on object familiarity (the outcome of an identical 
experiment, conducted subsequently on four naive subjects, was the same as described 



4 Excluding the intercepts - we were not interested in mere uniform decrease of RT for all views. 




106 



below - see [1], experiment 3). For this subject, the responses of both sessions of the 
previous experiment, consisting together of 5 trials per view per object, were combined, 
and an additional 5-trial session was performed. The results of this experiment appear 
below. 



2.5.1 Analysis of coefficient of variation of RT, ER 




Figure 5: Coefficient of variation of RT over views (%) for the two sessions of experi- 
ment 2, by complexity (dot, square and triangle mark low, middle and high complexity, 
respectively). 



The plot of the coefficient of variation of RT for experiment 2 (Figure 5) shows that 
it decreased with session for the low and the medium, but not for the high, complexity 
groups. The overall effect of session was significant. 

The plot of the coefficient of variation of ER for experiment 2 appears in Figure 6. 
Only the main effect of complexity was significant. A separate analysis for session by 
complexity revealed no significant effects of session in any complexity group. 

2.5.2 Regression analysis of RT, ER 

Regression of RT on D and D 2 for session 1 (see Figure 7) was significant, giving 
RT = 0.475 + 0.058D — 0.007D 2 . Importantly, it was not significant for session 2. That 
is, the dependence of RT on the distance to the best view was strongly diminished. 
Regression of ER was not significant for both sessions. 




107 




Figure 6: Coefficient of variation of ER rate over views (%) for the two sessions of exper- 
iment 2, by complexity (dot, square and triangle mark low, middle and high complexity, 
respectively). 




0 SO 100 ISO 

distance from best view (t*z) 



Figure 7: Regression curves of RT on D for the two sessions of experiment 2. Scale 
labeling is as in the previous regression plot. The flatter curve refers to session 2. 



108 



3 Discussion 

3.1 Complexity effects 

The influence of stimulus complexity on mean RT and ER was in part expected (higher 
complexity resulted in longer RT and higher ER than middle complexity), and in part 
unexpected (lower complexity had a similar effect). A possible explanation involves the 
notion of viewpoint-invariant, non- accidental features of 3D objects [6]. These features 
are more likely to be present in objects that have, by our definition, higher complexity. 
While the presence of features such as collinear segments can facilitate recognition, having 
too many of them would have an opposite effect, e.g., by prompting the subject to resort 
to a more complicated procedure. Having too few of these features could also impede 
recognition (by increasing ambiguity). 

Stimulus complexity had no effect on the coefficient of variation of RT over views. It 
appears that most of the variation of RT (as opposed to the mean RT) is due to factors 
other than complexity, such as the general outlook of our stimuli (e.g., an elongated wire 
seen end-on would be naturally harder to recognize than the same wire seen from the 
side). On the other hand, stimulus complexity affected the coefficient of variation of ER 
over views. We do not attempt to interpret this effect, because of the possible subject x 
complexity interaction (see the difference between the data for subject DW and the other 
two subjects in Figure 3). 

3.2 Session (familiarity) effects 

Our data indicate a clear effect of familiarity on the prominence of canonical views, at 
least for the kind of objects we have used as stimuli. Familiarity appears to reduce 
the differences in RT among different views of the object (see Figure 2), and to render 
insignificant possible effects of mental rotation, as manifested in the dependency of RT on 
the distance to the canonical view (Figure 7). We interpret session effects in the absence 
of feedback as an indication of imprinting of familiar views that happens merely as a 
result of repeated exposure. 

3.3 Interpreting regression results 

Experimental results in which recognition time of an object depended on the amount of 
rotation necessary to bring it to a familiar orientation have been previously interpreted 




109 



in terms of mental rotation [17]. The major argument in favor of this interpretation 
is indirect and has to do with similarity between the slope of the regression curve in 
recognition and in classical mental rotation tasks [16, 14]. The reciprocal of the coefficient 
of D in the regression equation for RT(D) in session 1 in our experiments (approximately 
300 deg/sec) is also consistent with that of mental rotation. This result, along with the 
apparent absence of an orderly dependence of ER on Z), can be accommodated by a 
theory of recognition that involves two distinct stages: normalization and comparison (cf. 
Ullman’s recognition by alignment [18]). In the normalization stage, the image and 
a model are brought to a common attitude in a visual buffer. This operation could be 
done by a process analogous to mental rotation, which would take time proportional to 
the attitude difference between the image and the model. Subsequently, a comparison 
would be made between the two. The time to perform the comparison could depend, 
e.g., on the object’s complexity, but not on its attitude, so that the comparison stage 
would contribute a constant amount to the overall recognition time. On the other hand, 
the error rate of recognition would be largely determined by the comparison stage. With 
practice, more views of the stimuli could be retained by the visual system, resulting in 
a smaller average amount of rotation necessary to normalize the input to a standard, or 
canonical, appearance. Thus, the mean response time (determined by the normalization 
process) would decrease, but the mean error rate (determined by the comparison process) 
would not, because of the absence of feedback to the subject. This is compatible with our 
observations. 

The strong quadratic component in the regression equations for RT(D) may signify 
the presence of more than one preferred, or canonical, view. Imagine the viewing sphere 
(see section 2) centered around a wire-like object, with the best (shortest-RT) view at the 
north pole. Then the view from the south pole of the sphere (at D = 180° from the north 
pole) ought to yield shorter RT than views from the equator, because the projection of a 
wire looks almost the same from two diametrically opposite directions. This may explain 
the shape of the regression curve for RT (D). 

3.4 Summary of recognition psychophysics 

To recapitulate, our main findings are as follows. 

• Stimulus complexity has little effect on the variation of RT over views; 

• Stimulus familiarity reduces the variation of RT over views; 




110 



INPUT (feature) LAYER 




Figure 8: The network consists of two layers, F (input, or feature, layer) and R (represen- 
tation layer). Only a small part of the projections from F to R are shown. The network 
encodes input patterns by making units in the R-layer respond selectively to conjunctions 
of features localized in the F-layer. The curve connecting the representations of the differ- 
ent views of the same object in R-layer symbolizes the association that builds up between 
these views as a result of practice. 

• Familiarity reduces the effect that can be interpreted in terms of mental rotation, 
namely, the dependency of RT on the distance to the canonical view. 

These effects support the notion of a tradeoff between time required for viewpoint nor- 
malization and memory invested in storing multiple views of objects. One possible com- 
putational interpretation of our findings is in terms of a two-stage process of recognition 
by normalization, gradually superseded with practice by a more memory-intensive, less 
time-consuming strategy. The following section explores a different model, which appears 
to be equally capable of reproducing our data. 

4 The model 

Can a simpler computational process than the two-stage recognition by alignment 
account for our results? To address this question, we simulated the experiments described 
above with a two-layer network of thresholded summation units (see Figure 8) [2]. The 
“stimuli” in the simulation were the projections of the vertices of the same wire objects 
used in the actual experiment. This allowed us to make a direct qualitative comparison of 




Ill 



the simulation results with the data from human subjects. In the following, we describe 
briefly the principle of operation of the network. 

4.1 Learning 

The first (input, or feature) layer of the network is a feature map. Every unit in the F- 
layer is connected to all units in the second (representation) layer. The strength of these 
“vertical” (V) connections has initially a Gaussian distribution. In addition, the units 
in the representation layer are connected among themselves by lateral (L) connections, 
whose initial strength is zero. The V-connections form specific- view representations. The 
L-connections form associations among different views of the same object. 

The input is a sequence of appearances of an object, encoded by the 2D locations 
of concrete sensory features (line terminators and corners) rather than a list of abstract 
features. At the first presentation of a stimulus several representation units are active, all 
with different strengths (due to the Gaussian distribution of vertical connection strengths). 
We employ a winner- take- all (WTA) mechanism to identify the strongest active R-unit. 
Hebbian relaxation then enhances V-connections from the input layer to the winner. 
Specifically, the connection strength v a * from F-unit a to R-unit b changes by 

A Vai = min {aA a [A b - T h }v ah , v™* - Vafc } . (1) 

where and T u are the activity and the threshold of the unit u, v max is an upper bound 
on connection strength and a is a parameter controlling the rate of convergence ([; x ] is 
defined as max{«,0}). The threshold of the winner R-unit is increased by 

= ( 2 ) 

a 

where 8 < 1. As a result, this R-unit encodes the spatial structure of a specific view, 
responding selectively to this view after only a few (two or three) presentations. 

The principle by which specific views of the same object are grouped is that of temporal 
association. New views of the object appear in a natural order, corresponding to their 
succession during an arbitrary rotation of the object. The lateral (L) connections in the 
representation layer are modified by a time-delay Hebbian relaxation. L-connection w ^ 
between R-units b and c that represent successive views are enhanced in proportion to 
their closeness in time, up to a certain time difference K : 




112 



= £ 7 *K - T‘] [^ +fc - T‘ +k ] • * ■ (3) 

l*l<* W 

The appearance of a new object is explicitly signalled, so that two different objects do 
not become associated by this mechanism. The parameter 7 * decreases with \k\ so that 
the association is stronger for units whose activation is closer in time. In this manner, 
a footprint of temporally associated view- specific representations is formed in the second 
layer for each object. Together, they form a distributed multiple- view representation of 
the object. 



4.2 Recognition 

An input presented to the feature layer produces a pattern of activity in the representation 
layer. We define the object whose footprint is closest to this activity pattern as the 
outcome of recognition. A measure of closeness between two patterns is provided by 
correlation . This choice may be clarified by considering a model of decision-making in 
recognition in which many units (possibly with different initial levels of activation) encode 
the known entities (one unit per entity; cf. [ 8 , 11]. In our case several units together encode 
an object.). When an input is present, each unit’s activation is increased in proportion to 
the similarity between the input and the concept that the unit represents. The decision 
threshold, initially kept high to discourage false alarms, is gradually decreased, until it is 
exceeded by some unit’s activation. Recognition latency in this scheme clearly depends 
on the activation induced by the input in the would-be strongest representation unit. In 
our scheme, this activation is measured by the correlation between the actual footprint 
induced by the input and the prototypical memory trace of this footprint. This correlation 
also serves as an analog of response time. 

5 Simulation 

We were able to reproduce all three main results of the psychophysical experiment 
described in the previous section, with a random choice of the parameters of the network 
model. First, no dependency of the coefficient of variation of correlation (CORR) over 
views on stimulus complexity was found. Second, the variation of CORR over views 
decreased with practice (Figure 9). Third, the dependence of CORR on stimulus attitude 
(Figure 10) diminished with practice. 




113 




Figure 9: Coefficient of variation of CORR over views for the two sessions, by complexity, 
before the introduction of shortcuts into the footprint (see text). 




Figure 10: Regression of CORR on distance to the best view, by session, before the 
introduction of shortcuts into the footprint (see text). Note the similarity to the regression 
plot of experiment 1, keeping in mind that high CORR is analogous to low RT. 




114 



The effect of session on the coefficient of variation of CORK was significant (F(l, 16) = 
15.88, p < 0.001). A multivariate test of the difference between the sets of regression 
coefficients corresponding to sessions 1 and 2 (excluding the intercept) was insignificant 
(.F(2,157) = 1.5, p = 0.23; compare this to the outcome of an analogous test of the 
regression coefficients for experiment 1 in section 2.4.3), raising questions regarding the 
ability of the model to replicate the flattening out of the regression of RT on D . To 
further test this ability, we allowed the enhancement of the lateral connections in the 
representation layer during the test phase of the simulated experiment in addition to 
their enhancement during the training phase (controlled by 7 * in equation 3). As a 
result, shortcuts appeared in the sequences of R-units representing successive views of 
objects, obliterating the linear structure of these sequences (footprints), responsible for 
the semblance of mental rotation apparent in Figure 10. 





























J 


1 


I 





























; i : 1 

0.5 1.0 1.5 2.0 2.5 

session 



Figure 11 : Coefficient of variation of CORR over views for the two sessions, by complexity, 
after the introduction of shortcuts into the footprint (see text). 

Introducing the shortcuts enhanced the session effect, increasing the significance of 
the difference between the regression coefficients of CORR on D for the two sessions 
(-P(2, 157) = 2.6, p < 0.08; see Figure 12). The effect of shortcuts on the coefficient of 
variation of CORR was even stronger (compare Figure 11 with Figure 9). Apparently, 
already the first session caused the CORR characteristics for the different views to reach 
their steady- state values. 

Finally, we assessed the generalization ability of the network to novel views of familiar 
objects. Rock [12, 13] found that people have difficulties in recognizing or imagining wire- 




115 




Figure 12: Regression of CORR on distance to the best view, by session, after the intro- 
duction of shortcuts into the footprint (see text). To facilitate comparison with human 
subject data (Figure 7), 1 — CORR rather than CORR is plotted against the distance to 
the best view. Note the flattening of the regression curve in session 2. 




Figure 13: Performance of the net on novel orientations of familiar objects (mean of 10 
objects, bars denote the variance). 





116 



frame objects in a novel orientation that differs by more than 30° from a familiar one. 
We have tested the network model on views obtained by rotating the objects away from 
learned views by as much as 24° (see Figure 13). The classification rate was better than 
chance for the entire range of rotation, but decreased to 35% at 24° (chance level was 10% 
for ten objects). This result establishes another parallel between human performance and 
the performance of our model in a recognition task. 

6 Summary 

Two main effects are apparent in our data: (1) the decrease with practice in the variation 
of recognition latency over different views of an object and (2) the disappearance of 
the dependence of the latency on the object’s orientation relative to a canonical view. 
Both these effects support the notion of a tradeoff between time required for viewpoint 
normalization and memory invested in storing multiple views of objects. A standard 
interpretation (questioned by some researchers [13]) of the second effect is in terms of 
mental rotation (e.g., for the purpose of alignment) of object representations, that becomes 
unnecessary when many specific views of objects are stored as a result of practice. 

The simulated replication of our psychophysical data by a model that has no a priori 
mechanism for “rotating” object representations indicates that a different interpretation 
of findings that are usually taken to signify mental rotation is possible. Cooper ([14], 
p,160ff) opened the way to such interpretation: “one-to-one correspondence between the 
intermediate states in a mental rotation and a rotation of an external object need not be 
one of a structural isomorphism between the internal representation undergoing mental 
rotation and the external object undergoing the physical rotation”. The footprints formed 
in the representation layer in our model provide a hint as to what the substrate upon which 
the mental rotation phenomena are based may look like. 

Acknowledgements 

We thank Jeremy Wolfe and Ellen Hildreth for their comments on an early draft of this 
paper. This paper describes research done within the Center for Biological Informa- 
tion Processing in the MIT Department of Brain and Cognitive Sciences. The Center’s 
research is sponsored by grants from the Office of Naval Research (ONR), Cognitive 
and Neural Sciences Division; by the Alfred P. Sloan Foundation; and by the National 




117 



Science Foundation. TP is supported by the Uncas and Helen Whitaker Chair at the 
MIT Whitaker College, by the MIT Artificial Intelligence Laboratory, by Hughes Aircraft 
Corporation, and by the NATO Scientific Affairs Division. SE and DW are supported 
by Chaim Weizmann Postdoctoral Fellowships and by grants from the National Science 
Foundation. HHB is currently with the Department of Cognitive and Linguistic Sciences, 
Brown University. 



References 

[1] S. Edelman, H. Biilthoff, and D. Weinshall. Stimulus familiarity determines recogni- 
tion strategy for novel 3D objects. A.I. Memo No. 1138, AI Laboratory, MIT, July 
1989. 

[2] S. Edelman and D. Weinshall. A self-organizing multiple- view representation of 3D 
objects. A.I. Memo No. 1146, AI Laboratory, MIT, August 1989. 

[3] P. Jolicoeur. The time to name disoriented objects. Memory and Cognition , 13:289- 
303, 1985. 

[4] A. Koriat and J. Norman. Mental rotation and visual familiarity. Perception and 
Psychophysics , 37:429-439, 1985. 

[5] A. Larsen. Pattern matching: effects of size ratio, angular difference in orientation 
and familiarity. Perception and Psychophysics , 38:63-68, 1985. 

[6] D. G. Lowe. Perceptual organization and visual recognition. Kluwer Academic Pub- 
lishers, Boston, MA, 1986. 

[7] D. Marr. Vision . W. H. Freeman, San Francisco, CA, 1982. 

[8] J. Morton. Interaction of information in word recognition. Psychological Review , 
76:165-178, 1969. 

[9] S. Palmer, E. Rosch, and P. Chase. Canonical perspective and the perception of 
objects. In J. Long and A. Baddeley, editors, Attention and Performance IX , pages 
135-151. Erlbaum, Hillsdale, NJ, 1981. 

[10] S. E. Palmer. The psychology of perceptual organization: a transformational ap- 
proach. In J. Beck, B. Hope, and A. Rosenfeld, editors, Human and machine vision , 
pages 269-340. Academic Press, New York, 1983. 

[11] R. Ratcliff. Parallel processing mechanisms and processing of organized information 
in human memory. In J. A. Anderson and G. E. Hinton, editors, Parallel models of 
associative memory . Erlbaum, Hillsdale, NJ, 1981. 

[12] I. Rock and J. DiVita. A case of viewer- centered object perception. Cognitive Psy- 
chology, 19:280-293, 1987. 

[13] I. Rock, D. Wheeler, and L. Tudor. Can we imagine how objects look from other 
viewpoints? Cognitive Psychology , 21:185-210, 1989. 

[14] R. N. Shepard and L. A. Cooper. Mental images and their transformations. MIT 
Press, Cambridge, MA, 1982. 




118 



[15] R. N. Shepard and J. Metzler. Mental rotation of three-dimensional objects. Science , 
171:701-703, 1971. 

[16] S. Shepard and D. Metzler. Mental rotation: effects of dimensionality of objects and 
type of task. J. Exp . Psychol: Human Perception and Performance , 14:3-11, 1988. 

[17] M. Tarr and S. Pinker. Mental rotation and orientation- dependence in shape recog- 
nition. Cognitive Psychology , 21, 1989. 

[18] S. Ullman. Aligning pictorial descriptions: an approach to object recognition. Cog- 
nition, 32:193-254, 1989. 




Part 2 

Hands and Tactile Perception 





The Perception of Mechanical Stimuli Through the Skin of the 
Hand and its Physiological Bases 

R.T. Verrillo and S .J. Bolanowski, Jr. 

Institute for Sensory Research 
Syracuse University 
Syracuse, NY 13244, U.S.A. 

Abstract 

A series of experiments was performed in which psychophysical measurements were made at 
threshold and suprathreshold levels of stimulation under carefully controlled laboratory 
conditions. A wide range of stimulus parameters was explored including sinusoidal 
frequency, intensity, size of contactor, surround condition, skin temperature, and body site. 
These results are compared to experiments in which responses to sinusoidal displacements 
were measured from receptor nerve fibers in this and other laboratories. Our work has 
culminated in a model of cutaneous mechanoreception that proposes four discrete information 
channels that combine at threshold and suprathreshold levels to signal tactile perception.The 
psychophysical channels and their physiological counterparts are: 

1) the Pacinian channel, mediated by Pacinian-corpuscle afferents; 

2) the Non-Pacinian I channel, mediated by rapidly-adapting afferents (Meissner corpuscles); 

3) the Non-Pacinian II channel, mediated by slowly-adapting Type II afferents (Ruffini 
endings); and 

4) the Non-Pacinian III channel, mediated by slowly-adapting Type I afferents (Merkel 
cell-neurite complex). These channels operate over specific bands of vibratory frequencies 
and the channels partially overlap in their absolute sensitivities. 



Introduction 

All sensory systems can be divided into three major components. One of these is the receptor 
surface, which is responsible for the transduction of physical energy into neural signals. An 
example of this is the retina where the rods and cones together with the bipolar, horizontal, 
and amacrine cells transduce and condition the light energy impinging upon the eye. The 
second element is the transmission line that sends information from the receptor surface to 
processing areas located more centrally. For vision, this part is considered to be the 
ganglion-cell axonal processes which form the optic nerve. The third component is 
comprised of the central processing centers which decipher, modify, and integrate the 
sensory information. This is typified by the lateral geniculate nucleus and all of the visual 




122 



regions of the cortex. Of course, other schemes can be used to subdivide any sensory 
system. For example, each system can be disassembled along operational and functional 
lines rather than by anatomical structures. Sensory systems such as vision can be divided 
into psychophysically or physiologically defined "channels" which funnel the information 
centrally. One example of this is the chromatic and luminance channels; another is the 
separation of the spatial- and temporal-frequency channels. 

Unlike the eye, which transduces light energy exclusively, the skin contains receptors that 
mediate the perception of a variety of experiences produced by several forms of energy: 
mechanical (vibration and pain), thermal (warmth and cold), and chemical (pain). This 
receptor surface can be divided along anatomical as well as functional lines with the various 
subsystems each possessing its own, partially independent, organization. For example, the 
mechanical (i.e., vibratory) aspects of touch, at least for glabrous regions of skin, which are 
hairless, utilizes at least four anatomically distinct mechanoreceptor types, namely Pacinian 
corpuscles, Meissner corpuscles, Ruffini endings, and Merkel-cell, neurite complexes (Vallbo 
and Johansson, 1986; Bolanowski, Gescheider, Verrillo and Checkosky, 1988). The output 
of each is transmitted centrally via the peripheral nerves called Pacinian-corpuscle (PC)nerve 
fibers, Rapidly Adapting (RA) fibers. Slowly Adapting type-I (SA-I) fibers and Slowly 
Adapting type-II (SA-II) fibers. The impulses over these fibers are transmitted to spinal-cord 
centers as well as to the higher regions of the central nervous system such as the dorsal 
column nuclei, thalamus, and cortex. We (Bolanowski, et al., 1988) have recently proposed 
that each of the four peripheral pathways of this subsystem may be mapped operationally 
onto four psychophysically defined channels. Thus, the perception of the mechanical aspects 
of touch in all likelihood is comprised of a mixture of information originating from all four 
channels. 

The purpose of this paper is to show in a generic way how mechanical stimuli are 
transduced, how the information is encoded, and how the information may combine from 
the four psychophysically distinct channels to form perceptions of the mechanical aspects of 
touch. For this purpose we will focus on the Pacinian corpuscle, the tactile mechanoreceptor 
best understood and perhaps prototypical of all tactile mechanoreceptors. We hope to show 
how this and the other tactile mechanoreceptors can be linked to the four 
psychophysically-defined sensory channels of touch. 




123 



The Pacinian Corpuscle 

The Pacinian corpuscle is a blimp-shaped, capsular receptor found in the lower dermis and 
subcutaneous tissue of all mammals. (See, for example, Cauna and Mannan, 1958; Polacek 
and Mazanec, 1966; Nishi, et al., 1969). It is the largest (1.0 mm by 0.5 mm) 
mechanoreceptor in the skin and is composed of an outer capsule formed by lamellar cells 
which concentrically surround an inner core region (Chouchkov, 1971). The inner core is 
formed by hemilamellae which compress the nerve fiber innervating the capsule (Quilliam 
and Sato, 1955). The myelinated, single nerve fiber enters the corpuscle through one pole of 
the capsule. Upon reaching the inner core, the fiber looses its myelin sheath and becomes 
elliptical in cross section sandwiched between the hemilamellar cells of the inner core. The 
unmyelinated region which extends the entire length of the corpuscle is the region where 
mechanotransduction takes place. A complete description of the ultrastructural anatomy of 
the Pacinian corpuscle can be found in Spencer and Schaumberg (1973). We have focussed 
our work primarily on the Pacinian corpuscle since it is prototypical in its response 
properties. In fact, the current position of many tactile investigators is that the principles of 
transduction are the same in all tactile mechanoreceptors except that the accessory structures 
impose a mechanical filter which determines the frequency response of each receptor type. 
This is directly analogous to the chromophore structure of opsin in the retinal receptors called 
rods: rhodopsin having an action spectrum in the 460-540 nm range. Other photopigments 
(e.g., iodopsin) impart the spectral sensitivity of the cone-receptor types. Other factors that 
can contribute to the frequency response are response criteria and the intervening tissues 
between the receptor surface and the outside world. For example, in vision, the lens and 
ocular media filter various wavelengths of light, particularly in the ultraviolet region. For the 
vibratory stimuli that are used in our laboratory, the skin plays no such role. Van Doren 
(1989) has shown that the skin is functionally transparent for stimuli in the temporal domain, 
although it plays a major role in the spatial domain. 

Stimulus deformations produce receptor potentials in the unmyelinated portion of the nerve 
fiber: if they are of sufficient amplitude a neural impulse will be generated and passed along 
the myelinated fiber to the central nervous system. Figure 1 shows the relationship among a 
stimulus, the receptor potential, and the action potential generated by the underlying receptor 
potential for a single, isolated Pacinian corpuscle. In both the physiological and 
psychophysical experiments to be described, sinusoidal bursts of displacements were used as 
the preferred stimulus because the use of sinusoids permits a systems-analysis approach to 
the problem. The power of this approach has been amply demonstrated in the excellent work 
performed on both the auditory and visual systems. The lowest trace in Fig. 1 signifies two 




124 



cycles of the stimulus. For low stimulus intensities (Fig. 1A) only a barely discernible 
receptor potential can be seen in response to the stimulus. At moderate stimulus intensities 
(Fig. IB) the corpuscle fires one impulse for every cycle (1:1) of stimulation. At more 




Fig. 1 Relationship among stimulus, receptor potential, and nerve impulses of a Pacinian 
corpuscle. Shown is a voltage trace signifying a 50-Hz vibratory stimulus (Bottom trace) and 
the response of a corpuscle at three different stimulus intensities (A,B,C). The intensity of 
stimulation increased from A to C. In the stimulus trace, a negative deflection signifies 
compression of the cojpuscle. A positive deflection in the response traces of A through C 
corresponds to depolarization. 



intense stimuli, a 2 spike-per-cycle (i.e., 2:1) firing pattern occurs. A typical 
relationshipbetween stimulus intensity and impulse-firing rate is shown in Fig. 2. The firing 
rate in impulses per second is plotted as a function of stimulus displacement (NOTE: In this 
and similar figures to follow, stimulus amplitude is plotted in dB re 1 pm peak.) At lower 
stimulus amplitudes (A), the rate-intensity characteristic rises steeply with increases in 
stimulus intensity. Plateaus occur in the rate-intensity characteristic at multiples (C) and 
submultiples (B) of the stimulus frequency, a phenomenon known as phase locking. At 
moderate stimulus displacements, the corpuscles produce a 1:1 firing pattern over a wide 
range of stimulus amplitudes C-D). It is the 1:1 firing pattern that has been suggested as the 



125 



neural code used by the nervous system to signal activation of Pacinian corpuscles. 
However, we now know that the code is probably much more complicated (see below). At 
sufficiently high stimulus intensities (E) the corpuscle can fire in a 2:1 manner. The 
mechanism responsible for producing the rate-intensity functions and the characteristic 
phase-locking patterns is the underlying mechanotransductive process revealed by the 




Fig. 2 Rate-intensity characteristics obtained from the corpuscle in response to a 250 Hz 
stimulus. 

receptor potential. The receptor-potential response is basically an analog conversion of the 
mechanical stimulus. Impulse initiation produced by the receptor potential is simply an 
analog-to-digital conversion. A typical receptor-potential response to a large (50 |im peak) 
sinusoidal (50 Hz) displacement is shown in the inset of Figure 3. The lower trace of the 
inset shows the stimulus and the upper trace is the receptor potential. Notice that the receptor 
potential is highly nonlinear, possessing the properties of frequency doubling and hysteresis. 
These phenomenon can best be appreciated by plotting the stimulus amplitude versus the 




126 



response in the form of a Lissajous pattern. The function so generated is shown in Fig. 3 and 
is traditionally referred to as an input-output (I-O) function. Positive displacements represent 
stimulus compression of the corpuscle and negative values are stimulus withdrawals. The 
stimulating probe is statically indented 110 |xm into the receptor before being activated in 
order to ensure that the probe never fully withdraws from the corpuscle. The 1-0 function so 
obtained is non-monotonic, displaying a roughly linear response to stimulus compression, 
with a reversal of the response as seen in the third quadrant of the 1-0 plot. This phenomenon 
can be modeled by a system that produces a response somewhere between half- and 
full- wave rectification. Since the rectification is not symmetrical across the zero-displacement 




-40 -20 0 20 40 

DISPLACEMENT (pm) 



Fig. 3 The input-output transfer function relating the receptor-potential amplitude to the 
stimulus. The amplitude of the 100-Hz displacement was 40pm peak. Static indentation was 
1 10 |im. The inset shows the actual relationship between the stimulus (bottom trace) and the 
receptor-potential response. In the generation of the 1-0 response, the phase relationship 
between die response and stimulus was changed from that shown in the inset so that the peak 
receptor-potential response was referenced to maximum displacement 



axis, we have termed the receptor-potential 1-0 function an asymmetric full-wave 
rectification . The 1-0 functions indicate that at small stimulus displacements, the receptor 
potential will be linear (i.e., small transitions along the displacement axis). The larger the 
stimulus displacements, the more non-linear will be the receptor-potential response. Impulse 




127 



initiation in the nerve fiber occurs when the transmembrane potential (in this case the receptor 
potential) reaches a particular depolarization level in a given amount of time. Since the 1-0 
function of Pacinian corpuscles is asymmetric, the positive-going displacements generate 
greater receptor-potential amplitudes than the negative-going, given a fixed displacement 
level. The receptor-potential depolarization in the first quadrant of the 1-0 function is 
responsible for the low-intensity limb of the rate-intensity function and the 1:1 plateau. The 
2:1 firing pattern is produced when the stimuli are large enough to produce the second 
depolarization, evidenced by the asymmetric full-wave rectification properties in the third 
quadrant of the 1-0 function. 

Another important characteristic of systems in general is the frequency response. 
Traditionally, the frequency response is obtained by holding the input level constant, varying 
the stimulus frequency, and measuring the output. In sensory systems it is much easier to 
hold the output constant, as we do in measuring psychophysical detection thresholds (see 
below). For the physiological results, these functions can be generated using 




Fig. 4 Average frequency characteristics of Pacinian corpuscles. The error bars signify the 
standard deviations. The response criterion and number of corpuscles used in the averaging 
was: o, 1 spike/cycle, n=16; •, 1 spike/sec, n-19. 




128 



receptor-potential responses or neural-impulse firing rates. The experience of a "sensation" 
requires that the information transduced at the receptor be transmitted to more-centrally 
located processing regions. For this purpose, all sensory systems use neural impulses as the 
digital code which is passed over the transmission portion of the system. Since one of the 
goals of our laboratory is to link physiology to psychophysics, we will focus on frequency 
characteristics obtained when using neural-impulse information. Figure 4 shows the average 
frequency response obtained on many corpuscles using two response criteria: the first, 1 
impulse-per-stimulus burst and the other 1 impulse-per-stimulus cycle (1:1). The 
characteristics are U shaped with a maximal sensitivity around 250 Hz. The figure shows that 
the Pacinian corpuscle is responsive primarily to vibratory stimuli in the range of 40 Hz to 1 
KHz. The significance of this will become clear in the psychophysical portion of this report. 

The particular code that the tactile or any other sensory system uses to signal a sensory event 
is not known at this time, but there are several possibilities. These include, but are not 
limited to: firing rate (e.g., impulses/sec); number of impulses per stimulus; number of 




FREQUENCY (Hz) 



Fig. 5 Frequency characteristics of Pacinian corpuscles. Data replotted from Johansson, et 
al. (1982). The characteristics were generated by first extrapolating and interpolating their 
equi-amplitude response profiles and plotting the results as rate-intensity characteristics. The 
frequency responses shown here were then constructed from the rate-intensity characteristics 
by choosing various response criteria as shown in the figure and plotting the required 
stimulus amplitude versus stimulus frequency to attain the required neural response. 



129 



impulses per cycle of stimulation; and, changes in the stochastic properties of firing, 
especially when there is significant noise (spontaneous activity) in the system. When we try 
to link psychophysical behavior to underlying physiological mechanisms, we must assume 
that the criterion is related to some aspect of the neural activity. This is somewhat 
problematical, since different criteria can produce vastly different threshold-frequency 
characteristics. This is demonstrated clearly in Fig. 5, the data of which have been replotted 
from Johansson, Landstrom and Lundstrom (1982). They obtained responses from human 
Pacinian-corpuscle fibers (as well as the other fiber types) in response to vibratory stimuli by 
using the percutaneous electrophysiological technique pioneered by Vallbo and Hagbarth 
(1968). Johansson, et al. (1982) plotted their results as equi-intensity contours and we have 
replotted their Pacinian-corpuscle data using three fixed criteria to demonstrate the effects of 
response criteria on the frequency characteristics. Figure 5 shows that the frequency response 
changes dramatically when different criteria are used. Similar plots of the other three fiber 
types were also made and they show the same general effects. The criteria that we selected 
for analysis were based on a number of considerations including factors such as temporal 
summation, spatial summation, temperature, response characteristics of single fibers, and 
others as outlined in detail in Bolanowski, et al. (1988). Before the relation between the 
physiological and psychophysical aspects of touch can be established, it is first necessary to 
describe the psychophysical methods used and the research results obtained that have allowed 
us to define the capabilities of the tactile system. 



Psychophysical Experiments 

Although four fiber types have been identified in man, until recently it had not been shown 
that all contribute to the sensation of touch. Indeed, about 30 years ago we were firmly 
convinced that all sensations originating in the skin were produced by the stimulation of one 
type of receptor; a single morphological entity that was excited by vibration, pressure, heat, 
and cold, or physical damage. The sensations that we experience were thought to be the 
product of the spatiotemporal patterning of impulses produced by this ubiquitous but 
unidentified receptor that was able to transduce mechanical, thermal , and noxious energies 
(Sinclair, 1967). 

We can leap over many years of painstaking research by stating simply that when we 
(Verrillo, 1963) measured vibratory detection thresholds as a function of frequency, the 
function had two limbs; one that was relatively flat at low frequencies, and another that was 
U shaped between 40 and 1,000 Hz. What was important for the success of those early 




130 



experiments was the careful definition of the stimulus. The experimental approach has been 
further refined since then, but the basic techniques are still the same: 1) the use of sinusoidal 
vibratory displacements applied to the thenar eminence of the right hand by a circular 
contacting surface (contactor) which can be varied in size ( 0.008 to 2.9 cm^); 2) the 
contactors are surrounded by a rigid surface, separated from the contactor by a 1-mm gap, 
that confines the deformations to the area of stimulation; 3) displacements are produced 
around a static indentation of 0.5 mm into the skin to ensure contact between the skin and the 
contactor during the sinusoidal excursions; 4) regardless of the duration of the stimulus 
which can range from 10 -2400 ms depending upon the experiment, appropriate rise-fall 
times are used to avoid onset/offset transients; and 5) in order to reduce bias in the response 
of the observer thresholds are measured on human observers using a two-alternative, 
forced-choice tracking procedure (Zwislocki, Maire, Feldman, and Rubin, 1958) designed 
for this purpose. The recent addition of maintaining skin-surface temperature to within 
±0.5°C by circulating water through hollow chambers in the stimulus apparatus has provided 
an even better definition of the stimulus. 

Using the techniques described above and testing thresholds to very low (0.4 Hz) vibratory 
frequencies, we have recently shown that the psychophysical threshold characteristic as 
measured at the thenar eminence of the hand is actually composed of three different portions 
(Bolanowski, el al., 1988). The characteristic is given by the solid circles in Fig. 6 and was 
obtained using a large contactor surface (2.9 cm^) with the skin-surface temperature 
maintained at 30 °C. 

The low-frequency portion of the curve extends from 0.4 to 3.0 Hz and appears to be 
insensitive to changes in frequency. A second portion is frequency dependent with an 
approximate slope of -5 dB/octave between 3.0 and 40 Hz. The third portion is U shaped 
from 40 to 500 Hz with a slope of -12 dB/octave in the lower frequencies and maximally 
sensitive in the region of 250 to 300 Hz. 

The receptor system responsible for the high-frequency, U-shaped limb of the curve is highly 
sensitive, frequency dependent, and capable of integrating energy over time and space 
(Verrillo, 1962, 1963, 1965, 1966, 1968; and others). Because of its close similarity to 
physiological curves in shape, sensitivity, and position, obtained independently in many 
laboratories, we identified this system as the P-channel. An extensive series of experiments, 
including masking, adaptation, and matching procedures, as well as manipulations of 
stimulus size and duration, confirmed that the U-shaped curve was indeed due to the 
activation of Pacinian corpuscles. The short and long dashed line shows the shape and 




131 



location of the P channel as measured by many experiments (see Verrillo, above and 
Bolanowski and Verrillo, 1982; Gescheider, Frisina and Verrillo, 1979). 




Fig. 6 The averaged overall threshold-frequency characteristic (•) obtained from 5 observers 
in response to vibratory stimuli presented with a 2.8 cm 2 contactor to the thenar eminence of 
the hand. The curves comprise the four-channel model for taction, each being the 

threshold-frequency characteristics of the various channels: — , PC; , NP I; — 

— , NP II and , NP III. See text for discussion regarding origin of the data. From 

Bolanowski, et al., J.Acoust. Soc. Amer. 84 , 1680- 1694 (1988). 

Verrillo (1962, 1968) and his colleagues (Capraro, et al., 1979; Gescheider, 1976 and 
Verrillo and Bolanowski, 1986) have shown previously that thresholds between 10 and 100 
Hz are mediated by a channel different than the P channel. It is called NP I and is believed to 
be mediated by RA fibers (Lund, 1966; Talbot et al., 1968) which purportedly innervate 
Meissner corpuscles. Through several experimental series (Verrillo, 1962, 1968), NP I, 
unlike the P channel, has been shown not to possess temporal or spatial summation. The 
short-dash line in Fig. 6 represents the NP I channel based on current knowledge. 

The presence of the breakpoint between the low- and middle-frequency portions of the 
overall threshold function of Fig. 6 suggested the presence of a third channel operating in the 
lowest-frequency region (0.4-3.0 Hz). A series of experiments was designed to a explore 
this possibility (Bolanowski, et al., 1988). Previous experiments performed in our laboratory 
showed that at threshold levels of stimulation the channels are independent and do not interact 




132 



with each other. For instance, we have shown that an adapting stimulus, which activates 
only one channel, will have no effect on the threshold response of the other channels 
(Verrillo and Gescheider, 1977; Gescheider, Frisina and Verrillo, 1979). Using a 
signal-masking paradigm we also demonstrated channel independence; that is masking 
stimuli that were within the operating frequency range of one system had no effect on the 
detection threshold of the other systems (Gescheider, Verrillo, and Van Doren, 1982; 
Hamer, 1979; Hamer, Verrillo, and Zwislocki 1983; Verrillo, Gescheider, Caiman, and Van 
Doren 1983; Gescheider, O'Malley, and Verrillo 1983; Gescheider, Sklar, Van Doren, and 
Verrillo 1985). In general, the channel having the lowest threshold at any given frequency 
will dominate the detection-threshold response at that frequency. This rule is very orderly 




Fig. 7 Masking functions. Vibrotactile threshold shifts as a function of masker intensity. 
The test frequency was 15, 50, 80, or 300 Hz. The solid lines are linear-regression fits to 
averaged data from four subjects. The dotted lines are predicted from threshold data. From 
Gescheider, et al., J. Acoust. Soc. Amer. 72, 1421-1426 ( 1982). 



and as a consequence, masking functions, as shown in Figure 7, covering a wide range of 
intensities and frequencies can be used to predict and uncover frequency responses of 




133 



predicted or suspected channels. Although it is not possible to describe how the predictions 
were made (see Gescheider, Verrillo, and Van Doren, 1982) the figure shows that when 
masker (1/3 octave noise centered at 250 Hz) and signal (100 Hz, filled circles), are in the 
same channel, in this case the P channel, the masker produces a loss of sensitivity (masking) 
in the response to the signal. When the detection signal is set at lower frequencies (80, 50, 
15 Hz), optimally activating the NP I system, die signal and masker are in different channels 
and the masking effect disappears. This is evidenced by the plateau regions of the masking 
functions. The position of the plateau depends upon the frequency of the signal and the 
threshold of detection at that frequency. Further increases of intensity ensure that all of the 
energy is effectively concentrated in the optimal operating range of the other channel. The 
result is a resumption or continuation of the masking effect, but, now the masking function 
again describes the effect completely within a single system. The breakpoints in the masking 
functions are significant in that they define the position at which threshold detection is 
switched from one channel to another. 

By using the masking paradigm we were able to show that the low- and middle-frequency 
portions of the overall threshold frequency response shown in Fig. 6 were mediated by 
separate channels. This automatically established the frequency response of the 
low-frequency channel. We termed this channel NP III and its position and frequency 
response is shown by the solid line in the figure. Several theoretical arguments explained in 
detail in Bolanowski, et al., 1988 led us to believe that NP III is mediated by SA I fibers 
which presumably innervate Merkel-cell neurite complexes. 

A third non-Pacinian channel, NP II, the existence of which was originally proposed by 
Capraro, et al., (1979) has been shown by Gescheider, et al. (1985) to operate in the 
vibratory-frequency range similar to the P channel but at a much lower sensitivity. In these 
studies, the channel was defined by desensitizing the P channel through the use of a small 
stimulus area (i.e., minimizing spatial summation) and by using a masking paradigm to 
deactivate NP I. The position and response profile of NP II is given by the long-dashed line 
in Fig. 6. Since only four fiber types have been found in glabrous skin, we concluded by 
elimination that NP II is mediated by SA II fibers which connect to Ruffini end organs. 

In addition to manipulating the mechanical aspects of the stimulus, we varied the temperature 
of the skin systematically in order to determine if temperature had any effect on the response 
characteristics of the four channels (Bolanowski and Verrillo, 1982; Verrillo and 
Bolanowski, 1986). We found that the threshold response of the P channel was greatly 
affected by temperature, both in its sensitivity level and its frequency of maximal sensitivity 




134 



which shifted upward with increasing temperature. The effect of temperature on the NP II 
channel was also clear-cut but less than that of the Pacinian channel on both hairy and 
glabrous skin. In general, this population of receptors looses sensitivity as the temperature 
decreases. The effects of temperature on the other psychophysical channels (NP I and III) 
were not as clear-cut. The results and subsequent statistical analysis of the results showed 
that the NP I and NP III channels were clearly affected by temperature changes, but it was 
not possible to determine the exact manner of the effects because of complex interactions 
between temperature and stimulus frequency. 

A direct comparison of psychophysically-determined average threshold measurements and 
thresholds obtained in physiological experiments is shown in Fig. 8. The four panels 
represent the four receptor types and the criteria that meet the needs of the four 
psychophysical channels described below. The psychophysical data are replotted from Fig. 
6. The psychophysical channels shown in the four panes are: A, Non-Pacinian I (NP I); B, 
Pacinian (P); C, Non-Pacinian III (NP III); and D, Non-Pacinian II (NP II). The 
physiological data were obtained via the analysis of the data of Johansson, et al. (1982) as 
detailed above. The solid points and lines represent the physiological results of the four fiber 
types as shown in the four panels of the figure, namely: A, RA; B, PC; C, SA-I; and D, 
SA-II fibers. The response criteria selected for the different fiber types were: A, 1 
impulse/stimulus; B, 4 impulses/stimulus; C, 0.8 impulse/sec and D, 5 spikes/sec (see 
Bolanowski, et al., 1988 for an in-depth discussion regarding why these particular criteria 
were selected). The additional curve (solid line without data points) in panel B is the average 
results obtained from 5 Pacinian corpuscles isolated from cat mesentery and kept at 33°. The 
correlations between the physiological and psychophysical results in most cases is good. The 
matched results are not perfect, but they are close enough to suggest that each psychophysical 
channel has its own neural substrate and that all four fiber types can contribute to tactile 
sensation. Several factors can explain the differences between the psychophysical and 
physiological results including temperature effects, the presence of spatial and temporal 
summation for certain channels, and the physiological variability in responses. A more 
thorough discussion regarding these aspects can be found in Bolanowski, et al. (1988). 

As we have mentioned before, in every experiment we have ever performed, we have never 
been able to demonstrate interaction among the P and NP channels at threshold levels of 
stimulation. At threshold, we are convinced that the channels are completely independent, 
and that the shape of the overall psychophysical characteristic is determined by the channel 
most sensitive at any given frequency. At suprathreshold levels of stimulation, however, the 
picture is certainly more complex. In some psychophysical tasks, independence of channels 




135 




FREQUENCY (Hz) 



Fig. 8 Relationship between physiologically measured frequency characteristics of different 
fiber types (A, RA; B, PC; C, SA I and D, SA II) and psychophysically obtained 
threshold-frequency characteristics (A, NP I; B, P; C, NP III and D, NP II). 
Neurophysiological data points are interpolations and extrapolations of the average results 
presented by Johansson, et al., (1982) for selected response criteria: A, 1 impulse/stimulus; 
B, 4 impulses/stimulus; C, 0.8 impulses/sec and D, 5 spikes/sec. For the P channel (B), an 
additional physiological curve (Pacinian corpuscle) has been plotted. This curve is the 
average response of excised Pacinian corpuscles (N=6) maintained at 33°C and for a 
response criterion of 4 impulses occurring during the central 200 msec of a 300 msec 
stimulus burst. From Bolanowski, et al., J.Acoust. Soc. Amer. 84 , 1680- 1694 (1988). 



is preserved, but in others there is clear evidence of interaction among the channels. The 
phenomenon of enhancement, in which a conditioning stimulus has the effect of increasing 
the perceived intensity of a second stimulus, shows very clear evidence of no interaction 
between channels (Verrillo and Gescheider, 1975; Gescheider, Verrillo, Capraro, and 




136 



Hamer, 1977). If the conditioning stimulus and the test stimulus are in the same channel, the 
effect of enhancement is dramatic. However, if the two stimuli activate different channels, 
enhancement disappears. On the other hand, if the subject is asked to estimate the summed, 
overall, subjective intensity of both conditioning and test stimuli, there is clear evidence of 
interaction between channels. When the conditioning and test stimuli are set at frequencies 
that optimally excite different channels, the combined effect of subjective intensity is greater 
than that of either channel alone. The interaction between channels in these experiments is 
quite clear. 

In another set of experiments, we used a method called absolute magnitude estimation 
(Stevens, 1957; Heilman and Zwislocki, 1961; Zwislocki and Goodman, 1980; Verrillo* 
Fraioli, and Smith, 1969) in which the subject is asked to assign numbers perceived to match 
the subjective magnitude of the sensory experience. In our experiments, we asked subjects 
to estimate the perceived intensity of vibrotactile stimuli at temperatures known to affect the 
cutaneous mechanoreceptor channels, either by selectively increasing or decreasing activity 
in them. Our preliminary data show that the growth of perceived intensity of vibration on the 
skin is affected indeed by the temperature at the surface of the skin. Furthermore, there are 
features of the growth functions that suggest that as the physical intensity of the stimulus is 
increased, there is a sequestering of inputs from the less - sensitive channels. Although these 
experiments are still in progress, the results agree with the model of four independent 
channels. This suggests that at suprathreshold levels the code for perceptual quality may be 
considerably more sophisticated and complicated than had been realized previously, requiring 
that several channels contribute the information available for this puipose. In other words, 
fundamental qualities of sensation such as "pressure", "flutter", and "vibration" may 
contribute to form the many sensory attributes ascribed to the somatosensory system 
including form, texture, and borders. One implication of the four-channel model is that, 
before a true understanding of the manner in which sensory experiences such as 
"roughness", "softness", and a myriad of other sensations, can be achieved, it may be 
necessary to establish, across all receptor types, the psychophysical and physiological 
criterion responses that signal a sensory event both in single and in multiple channels. The 
development of multichannel-stimulation techniques may also be necessary in order to engage 
the requisite channels. This will be especially important for puiposes of developing better 
devices as surrogate inputs for vision and audition, and for designing appropriate 
sensory-feedback systems for limb prostheses and robots. 




137 



References 



Bolanowski, S.J. ,Jr.: Intensity and frequency characteristics of Pacinian corpuscles. 
Institute for Sensory Research, Syracuse University, Syracuse, NY. Ph.D. dissertation 
and Special Rep., ISR-5-20. 1981 

Bolanowski, S.J., Jr., Gescheider, G.A., Verrillo, R.T., and Checkosky, C.M.: Four 
channels mediate the mechanical aspects of touch. J. Acoust. Soc. Am. 84, 1680 - 1694 
(1988) 

Bolanowski, S.J., Jr., and Verrillo, R.T.: Temperature and criterion effects in a 

somatosensory subsystem: A neurophysiological and psychophysical study. J. 

Neurophysiol. 48 , 836 - 855 (1982) 

Capraro, A.J., Verrillo, R.t., and Zwislocki, J.J.: Psychophysical evidence for a triple 
system of mechanoreceptors. Sens. Processes 3,334-352 (1979) 

Cauna, N. and Mannan, G.: The structure of human digital Pacinian corpuscles ( Corpuscula 
Lamellosa) and its functional significance. J. Anat. 92, 1-25 (1958) 

Chouchkov, C.N.: Ultrastructure of Pacinian corpuscles in men and cats. Z.F. Mikro-anat. 
Forsch. 83, 33-46 (1971) 

Gescheider, G.A., Frisina, R.D., and Verrillo, R.T.: Selective adaptation of vibrotactile 
thresholds. Sen. Processes 3, 37 - 48 (1979) 

Gescheider, G.A., O'Malley, M.J., and Verrillo, R.T.: Vibrotactile forward masking: 
Evidence for channel independence. J. Acoust. Soc. Am. 74, 474 - 485 (1983) 
Gescheider, G.A., Sklar, B.F., Van Doren, C.L., and Verrillo, R.T.: Vibrotactile forward 
masking: Psychophysical evidence for a triplex theory of cutaneous mechanoreception. 
J Acoust. Soc. Am. 78 , 534 - 543 (1985) 

Gescheider, G.A., Verrillo, R.T., Capraro, A.J., and Hamer, R.D.: Enhancement of 
vibrotactile sensation magnitude and predictions from the duplex model of 
mechanoreception. Sen. Processes 7, 187 - 203 (1977) 

Gescheider, G.A., Verrillo, R.T., and Van Doren, C.L.: Predictions of vibrotactile masking 
functions. J. Acoust. Soc. Am. 72, 1421 - 1426 (1982) 

Hamer, R.D.: Vibrotactile masking: Evidence for a peripheral energy threshold. Institute 
for Sensory Research, Syracuse University, Syracuse, NY. Ph.D. dissertation and 
Special Rep. ISR-5-18. 1979 

Hamer, R.D., Verrillo, R.T., and Zwislocki, J.J.: Vibrotactile masking of Pacinian and 
non-Pacinian channels. J. Acoust. Soc. Am. 73, 1293 - 1303 (1983) 

Heilman, R.P., and Zwislocki, J.: Some factors affecting the estimation of loudness. J. 
Acoust Soc. Am. 33, 687 - 694 (1961) 

Johansson, R.S., Landstrom, U., and Lundstrom, R.: Responses of mechanoreceptive 
afferent units in glabrous skin of the human hand to sinusoidal skin displacements. Brain 
Res. 244, 17 - 25 (1982) 

Polacek, P. and Mazanec, F.: The distribution of myelin on nerve fibers from Pacinian 
corpuscles. J. Physiol. 729, 167-176 (1966) 

Sinclair, D.: Cutaneous sensation. London: Oxford University Press 1967 
Spencer, P.S. and Schaumberg, H.H.: An ultrastructural study of the inner core of the 
Pacinian corpuscle. J. Neurocytol. 2, 217-235 (1973) 

Stevens, S.S.: On the psychophysical law. Psychol. Rev. 64, 153 - 181 (1957) 

Vallbo, A.B., and Hagbarth, K.-E.: Activity from skin mechanoreceptors recorded 
percutaneously in awake human subjects. Exp. Neurol. 27, 270 - 289 (1968) 

Van Doren, C.L.: A model of spatiotemporal tactile sensitivity linking psychophysics to 
tissue mechanics. J. Acoust. Soc. Am. 85, 2065 - 2080 (1989) 

Verrillo, R.T.: Investigation of some parameters of the cutaneous threshold for vibration. J. 
Acoust. Soc. Am. 34, 1768 - 1773 (1962) 

Verrillo, R.T.: Effect of contractor area on the vibrotactile threshold. J. Acoust. Soc. Am. 
35, 1962 - 1966 (1963) 

Verrillo, R.T.: Temporal summation in vibrotactile sensitivity. J. Acoust. Soc. Am. 37, 
843- 846(1965) 




138 



Verrillo, R.T.: Vibrotactile sensitivity and the frequency response of the Pacinian corpuscle. 
Psychon. Sci. 4, 135 - 136 (1966) 

Verrillo, R.T.: A duplex mechanism of mechanoreception. In: The Skin 

Senses. (D.R. Kenshalo Ed.) Springfield, IL: Thomas 139 - 159. 1968 
Verrillo, R.T., and Bolanowski, S.J., Jr.: The effects of skin temperature on the 
psychophysical responses to vibration on glabrous skin and hairy skin. J. Acoust. Soc. 
Am. 80, 528 - 532 (1986) 

Verrillo, R.T., Fraioli, A.J., and Smith, R.L.: Sensation magnitude of vibrotactile stimuli. 
Percept. Psychophys. 6, 366 - 372 (1969) 

Venillo, R.T. and Gescheider, G.A.: Enhancement and summation in the perception of two 
successive vibrotactile stimuli. Percept. Psychophys. 18, 128 - 136 (1975) 

Verrillo, R.T. and Gescheider, G.A.: Effects of prior stimulation on vibrotactile thresholds. 
Sen. Processes 3, 37 - 48 (1979) 

Verrillo, R.T., Gescheider, G.A., Caiman, B.G., and Van Doren, C.L.: Vibrotactile 
masking: effects of one - and two-site stimulation. Percept. Psychophys. 33, 379 - 387 
(1983) 

Zwislocki, J.J., and Goodman, D.A.: Absolute scaling of sensory magnitudes - a 

validation. Percept. Psychophys. 28, 28 - 38 (1980) 

Zwislocki, J.J., Maire, F., Feldman, A.S., and Rubin, H.J.: On the effect of practice and 
motivation on the threshold of audibility. J. Acoust. Soc. Am. 30, 254 - 262 (1958) 



Acknowledgements 

This research was supported in part by Grants NS 23933, R01 DC-00098, and P01 
DC-00380, National Institutes of Health, U.S. Department of Health and Human Services. 




Borrowing Some Ideas From 
Biological Manipulators to 
Design an Artificial One 

Vincent Hayward 

McGill Research Center for Intelligent Machines 
McGill University, 3480 University Street 
Montreal, Quebec Canada H3A 2A7 

Abstract. The design of robotic manipulators is a difficult question because 
most of the traditional disciplines needed for the design of robots, like kine- 
matics and dynamics, are mostly analytic and have little synthetic power. We 
first discuss design seen as a generative process and suggest that analogy is 
a powerful design method. Then a spherical mechanism actuated in parallel 
with a large workspace that can be used to construct a complete limb is dis^ 
cussed. The design synthesis is performed by translating ideas borrowed from 
the design of biological manipulators. 

1 Introduction 

Commercially available robot manipulators exhibit a degree of elegance and 
adequacy which is far from approaching what can be observed in biological ma- 
nipulators. Hence, seeking inspiration from Nature remains quite an appealing 
approach. In fact, even the most application oriented industrial manipulators 
always bear some degree of resemblance with human arms: a sequence of ar- 
ticulated bodies with a distinguishable shoulder, elbow and wrist, see Figure 1 
for example; while submarine manipulators, for another example see Figure 2, 
recall crustacean limbs. 

This suggests that despite the claim that artificial manipulators really must 
match their applications and that no valid reason exists for using anthropo- 
morphism (and zoomorphism), the models of Nature remain, consciously or 
not, an inexhaustible source of inspiration. 1 

Design of manipulators entails a decision making process which concerns 
many attributes of the device, encompassing materials, assembly methods, 
mechanical structures, computational structures, sensor, motor and motion 
transmission technologies, and so-on, to achieve a desired level of functional 
capacity. The organic quality of biological systems, which any person engaged 
in engineering research can easily appreciate, is far from being achieved by 
any technological systems, except perhaps by those artifacts which have been 
developed and refined over centuries. Such examples can be found in hand tools 
and musical instruments. The violin, for instance, achieves the integration of 

*As J. Phillips puts it: “There is of course no reason to believe that robots (which 
are machines) should resemble us or animals, both of which are also machines; but the 
occurrence of anthropomorphism in our thinking and the consequent discussion about its 
appropriateness in design is almost inescapable” [14]. 




140 



several of the above mentioned aspects of design at an extraordinary level of 
harmony. 

The general objective of robot manipulator design is to devise a machine 
capable of (1) displacing tools within the largest possible amount of space while 
minimizing spatial intrusion or interference with the environment, and (2) 
imparting forces and torques onto the environment in a delicate and controlled 
fashion once a desired collision occurs, while (3) at the same time it is also 
capable of moving in free space at high velocity [4]. The problem stated above 
separates into two parts. The givens which are decided by the design and the 
controls which confer properties not exhibited by the original device. 




Fig. 1. Pair of manipulators designed by Robotics Research Inc. 




Fig. 2. Sketch of a submarine manipulator built by 
International Submarine Engineering Ltd. 



Clearly, the properties defined by design set bounds on what can be achieved 
by control. In the sole domain of kinematics, it is not the goal of robotics 
research to find all possible arrangements (which may be a the goal of the 
Theory of Mechanisms), but to find the most relevant ones for manipulation. 



141 



The largest amount of effort in robotics research has been concerned with the 
development of analytical tools such as kinematics and dynamics, disciplines 
that rest on well established physical principles. However, work on design still 
relies mostly on intuition because the synthetic power of these disciplines is 
difficult to exploit. 

The design of biological systems transcends human comprehension and is 
expected to remain as such well beyond the foreseeable future. It is however 
clear that the observation of salient features of examples found in Nature can 
lead to insights readily usable in technological systems. This paper attempts 
to suggest that Nature’s example can point to kinematic and structural sug- 
gestions quite applicable to current technology and which are directly derived 
from anatomical features observed in natural limbs. 

Contemporary and historical examples of this abound. Robotics takes its 
roots in the development of machines to extend human capacities. Thus, the 
history of robotics may be traced back at least to the Bronze age with the 
discovery of levers and wheels (rotary motion). Through-out the ages, devel- 
opments have been contributed by various civilizations. Examples come from 
the Sumerians, Greeks, Romans, the Renaissance, the Age of Enlightment, 
the Industrial Revolution, and not even including the less known in the West 
Asiatic Cultures, in a pattern chronologically aligned with the history of tech- 
nology. In the honor of the province of Tuscany which hosted this meeting, 
Leonardo da Vinci should be singled out as an illustrious precursor of the de- 
sign methodology based on the observation of Nature. The following example 
is particularly relevant to the theme of this paper. 




Fig. 3. This study suggests the Fig. 4. Leonardo envisaged springs 
emulation of bird wings to store energy in this 

[Cod. Atl. f. 308r.-a], “Ornitottero” [Cod.Atl.f.314r.-b.]. 




142 



Leonardo made extensive studies about bird wings in an attempt to emulate 
flight, see Figure 3, for example. As far as we know, these attempts were 
unsuccessful. He probably convinced himself that flight emulation could not 
be achieved by wing flapping mechanisms actuated by human muscular power 
and imagined to use springs to provide power, see Figure 4. It is nevertheless 
likely that the attempt to utilize aerodynamic forces in a more efficient manner 
led him to imagine the famous “air screw”, see figure 5 [3]. 

Since Nature optimizes her de- 
signs for reasons which we are 
not fully aware of, there is lim- 
ited justification for attempting 
faithful emulation of these de- 
signs. Rather, the approach 
might be the re-exploitation of 
certain design features found in 
Nature. It is manifest that bio- 
logical manipulators are not op- 
timized for many tasks of inter- 
est: a human arm is obviously ill 
suited to intervene in a nuclear 
reactor core. This does not mean 
that structures observed in Na- 
ture cannot be re-utilized. 

An exploratory study of redundancy was our motivating factor for the arm 
design described later in this paper. It was recognized that redundancy is not 
only desirable, but necessary to the design of general purpose manipulators [6]. 

2 Design as Problem Solving 

Ex nihilo nihil fit , design ideas can most of the time be traced back to some 
earlier attempts. 2 In general, design, seen as a problem solving activity, is 
very unconstrained. It has been observed that it can be described more like 
a process-driven activity rather than an optimizing activity. According to 
Simon, the design ‘process’ is picked by the designer according to a complex 
set of reasons while the goal may remain fuzzy [18]. 

Design proceeds by generation alternative designs are produced in large 
numbers until one of them satisfies a set of criteria. Only then, can an an- 
alytical optimizing activity take place. In the case of manipulators, only a 
surprisingly small number of design processes have been utilized by the indus- 
try, the result being a limited number of design styles, possibly because the 
robot manipulator technology is quite recent. It is interesting to look back for 
a moment at the past few decades during which industrial manipulators were 
developed. Apart from a few notable exceptions, current design concepts more 

2 For R. Buckminster Fuller: “When you and I speak of design, we spontaneously think 
of an intellectual conceptualizing event in which the intellect first sorts out a plurality of 
elements and then interarrange them in a preferred manner.” [1]. 




Fig. 5. Leonardo’s “Air Screw” fMs.B.f. 83v.] 





143 



or less follow the machine tool engineering tradition. This can be observed for 
robots used in the automotive industry. 

Most of those manipulators are designed for high positional accuracy and 
high rigidity, which makes them adequate for machine-tool-like applications. 
A number of difficulties are created when it is attempted to use these devices 
for other kinds of tasks, particularly those involving the control of forces when 
in contact with the environment. 

Among all existing kinematic structures, a four-bar mechanism for inner 
joints augmented by a three axis wrist with intersecting axes has emerged over 
time as the vastly dominant structure, as in a kind of a Darwinian evolution 
process. Similarly, one other kinematic structure known as the scara design 
(Selective Compliance Assembly Robot Arm) is overwhelmingly used in preci- 
sion assembly applications because of its adequacy for the task (dynamic and 
kinematic decoupling along the vertical and horizontal directions). 

Manipulator design occurs trying to satisfy an open set of constraints re- 
sulting in part from the laws of Nature, some of which are captured by the 
equations of kinematics and dynamics. Kinematics and dynamics have little 
synthetic power: they permit a designer to improve a proposed design through 
analysis or optimization, or to determine local features such the shape of cams. 
Sometimes, qualitative exploration of many arrangements in order to reach a 
functional goal is possible as demonstrated by Salisbury in the context of 
arm manipulation [16]. Other constraints result from technological feasibil- 
ity. These are of course difficult to obtain since they depend on the accuracy 
of available information, the risk involved in creating new technologies, and 
the rate of improvement. The remainder of the constraints encompasses a set 
of desired properties which can be quite arbitrary. These are decided upon 
by the designer for reasons that may have to do with experience, tradition, 
personality, wit, corporate image, budget, trends, fashion, and so-on. 

Vastly different motivations may be noticed in discussions pertaining to 
robotic designs, and once again two views can be opposed. The analytical, 
proof by existence, approach: “Nature produces systems which utilize real 
hardware that operates according to physical principles... the intent [of the 
design] is not to imply that the development of such systems will be an easy 
task, only that such systems can be developed” [11]; and the synthetical, task 
oriented, approach: “we feel that what is needed is a medium-complexity end 
effector: a device that combines the ease of control characteristic of the simple 
grippers with some of the versatility of the complex hands” [20]. 

As a result, an all-encompassing design goal can never be formalized; in- 
stead, as commented above, a generative method is selected. Possibilities are 
matched against the criteria that have been decided upon in advance. Un- 
promising alternatives of the successive versions are filtered in a process which 
is reminiscent of a technique known in artificial intelligence as “means-end 
analysis.” In this technique, not'only immediate choices are made to progress 
toward a goal, but also choices about the operators that are likely to lead to 
progression. The definition of quantitative criteria may help to automate part 
of the search process. The final goal is known once successive generations have 




144 



filtered through the constraints. However, it is unlikely that this design process 
will ever be reduced solely to an explicit search process, or to an optimizing 
process, game theoretical or otherwise. 

Optimality is difficult to include in the robot design activity, because opti- 
mality entails the existence of a well defined objective function, which opposes 
the requirement to create a general purpose machine. It is impossible to think 
of such a function since the space over which this function would be defined 
cannot be known before the end-result of the design process has been satisfac- 
torily described. Nonetheless, a design can be declared optimal with respect 
to a particular mathematical model and a particular criterion defined over 
the variables of this model. The relevance of the model is then of course an 
essential question. It has been our experience that oversimplification leads to 
physically non realizable structures [12]. 

A common methodology first entails the creation of generic modules which 
can be instanciated into a collection of devices having scaled properties (size, 
power and so on). The advantages of such an approach are well known and 
discussed at length in the computer science literature. The principles put for- 
ward in computer science are standardization (interface rules), polymorphism 
(hidding implementation), and composition (larger blocks made of smaller 
ones). They promote abstractions, reliability, ease of maintenance, and top- 
down design. Clearly, these principles significantly apply as well to electro- 
mechanical design. The second part of this methodology requires a decision 
upon a framework structure describing how modules inter-relate. In dealing 
with complexity, hierarchical organizations are often proposed. 

3 Overall Approach 

Some of the properties observed in biological manipulators that can be put to 
use in technological designs are now discussed. The most general observations 
fall in two categories: (1) on actuation and (2) on kinematics and structures. 
It is the purpose of this study to explore the second category in greater details. 

Limbs in Nature come in two varieties: en do-skeletons and exo-skeletons. 
In the endo-skeleton case, most of the material used passively (bones) is located 
inside the material used actively (muscles), whereas the opposite situation is 
observed in the exo-skeleton case (shells). This opposition is also observed to 
some degree of approximation in the distribution of material used in compres- 
sion is compared to that of material used in extension. 

So far, the design of artificial manipulators has followed mostly the exo- 
skeleton case. In contrast, we will follow here the endo-skeleton path (ver- 
tebrae) simply following the intuition that natural endo-skeletons seem more 
agile than the exo-skeleton ones (crustaceans). 

The most identifiable anatomical elements (anatomy deals with structure 
and morphology) are at a macroscopic scale, in the endo-skeleton case: mus- 
cles, tendons, ligaments, bones, and synovial joints. These elements correspond 
to a separation of mechanical and structural functions: extension, compression, 
mobility. We will also attempt of incorporate this separation in our design. 

A great deal of mobility in biological endo-skeletons limbs is achieved 




145 



through joints which approximate revolute (elbow, knee) pairs or spherical 
pairs (e.g. shoulder, hip, eye). These correspond to two symmetries that al- 
low continuous surface contact under motion: axial symmetry (re volute) and 
point symmetry (sphere). The other pairs (planar, prismatic and screw) are 
not found in natural limbs. An essential element of biological limbs is the 
spherical pair. Biological systems actuate spherical pairs using parallel actu- 
ation. The technological analogy is the parallel manipulator discussed below 
in greater details. 

The traditional design of manipulators is based on a completely serial de- 
sign: a succession of links and joints. Serial manipulators lead to accumulation 
of errors, lack of rigidity, low natural frequency that can be counteracted with 
parallel designs [10]. Despite the drawbacks of such an approach, it is the most 
commonly found structure. One of the reasons might be that their models lend 
themselves to easier analytical studies than those of parallel manipulators. 

The serial robot manipulator technology mostly uses massive metallic struc- 
tures designed to counteract the cantilever effect. An direct consequence is a 
resulting very poor weight /load ratios due to the “pyramidal effect”: Proximal 
joints must be designed to drive and support the sum of the distal links and 
joints. 

The principal advantage of serial manipulators is the amount of workspace 
and the minimization of spatial intrusion. Clearly, what is needed is a combi- 
nation of serial and parallel kinematics. It is not surprising that natural limbs 
are partly serial and partly parallel: the skeleton-muscle system creates many 
closed kinematic loops (quite complex to analyze), yet there is an amount of 
seriality to yield workspace (arm-forearm-hand). 

A complicated problem in the design of manipulators is the integration 
of actuators and sensors into the overall structure. Nature integrates sensors 
directly within the actuators at the microscopic scale and provides motion 
transmission devices with very small losses (tendons and sheaths). Of course, 
this idea as been utilized in the design of manipulators and mechanical hands 
despite numerous practical difficulties. A parallel kinematic structure with 
linear actuators can be viewed as a deformable truss. 

In such a truss design, actuators and sensors can be made parts of the struc- 
ture, thus achieving a high degree of integration that characterizes biological 
designs. Yet, the various parts of the structure can be made easily accessible 
and similar to others. This promotes modularity and interchangability [8]. 

An additional remarks adds weight in favor of the endo-skeleton case. Re- 
gardless of the structure which is chosen, position, velocities and forces need 
to be measured for control of manipulators. It is a fact of mechanics that the 
greatest amounts of velocity and smallest amount of forces in a manipulator 
in action will manifest themselves at the exterior parts of the structures. This 
suggest that force production elements as well as sensors should be placed 
as close to the external regions as possible. Thus passive elements should be 
placed inside to complete the structure which is made possible by the use of 
trussed structures. 

Truss structures have also interesting properties which are quite appealing 




146 



for limbs designs: the load on parts of the structure and on joints is always 
axial, they can be made out of a small vocabulary of elements, and a great 
deal is available on the design of such structures. 

4 Topological and Geometrical Observations 
Mechanisms may become “singular”. In fact, the map from input coordinates 
(joint variables) to output coordinates (active link coordinates) displays sin- 
gularities. To better illustrate that concept we will use topological terms as 
proposed by Burdick [2]. Homotopy allows to view mechanism at “order zero”, 
to describe qualitatively their kinematic properties. This can be easily grasped 
by considering a two link manipulator, Figure 6. 




(a) Workspace Boundaries (b) Configuration Manifold 

Fig. 6. Two link manipulator (a) and its configuration manifold (b), created 
by “stitching” two sheets together: 0 < 02 < n and — it < O 2 < 0. 

Singularities, described as critical points of the configuration manifold, 
come in two types. Separating singularities divide the configuration manifold 
into sheets such that any motion from one to the other must traverse a locus 
of singularities. Non-separating singularities simply create “holes”. These 
singularities are situated inside the workspace and motion involving constraints 
placed in the end-effector motions must avoid the surrounding region. 

The workspace of robot mechanisms is determined by three factors: self- 
interference of parts, travel limits of actuators, and one special locus of singu- 
larity of the separating type. In the case of a planar two links manipulator, it 
is easy to see that this locus is a circle centered at the first joint. There is also 
a geometric interpretation of singularities. In the case of serial manipulators, 
singularities occur, for example, when the axes of revolute joints align because 
two joints become mutually redundant. The manipulator becomes “locked” 
for motions around a direction perpendicular to the mutual axis due to loss of 
a degree of freedom. 

As an example, we will illustrate this interpretation on the advanced ma- 
nipulator designed by Salisbury and Townsend described in this proceedings. 
This arm, the geometry of which is seen Figure 7, has of two elongated links. 



147 



It has been designed so 
as to be able to uti- 
lize the entire surface of 
its links in contact tasks. 

Thus, a complete mobil- 
ity of both links is essen- 
tial. From a geometri- 
cal view-point, the objec- 
tive of orienting arbitrar- 
ily the two links in space is 
completely achieved (four 
parameters, four joints). 

However, the existence of 
a “hole creating” singu- 
larity when joint 1 and 3 
are aligned prevents full 
usage of the arm within 
its workspace, although 
it can freely maneuver 
around it. 

The problem of loss of mobility of serial manipulators can be treated with 
supplementary joints which enhance the global mobility of the mechanism in 
such a way that local loss of mobility can be counteracted with its kinematic 
redundancy [6]. The example of a four revolute joint mechanism which pro- 
vides full orientation capability has been worked out by Long and Paul [13]. 
This strategy has only limited applicability for a number of reasons. Adding 
more serial joints only increase the problems that affect serial manipulators 
such as accumulation of errors, and degradation of dynamics that have been 
alluded to earlier. 

In addition, augmenting the number of revolute joints does not remove any 
singularities for reasons that are clear from Burdick’s topological arguments. 
In fact, the more serial joints are added, the more complex the topological map 
of the manipulator becomes and the more complex the control and program- 
ming become. Thus this possibility for designing a highly dextrous manipula- 
tor has been discarded. We now turn our attention to parallel manipulators, 
since it is the intention to include them in the design. 

As described by Hunt [10], for parallel manipulators, singularities also occur 
in special geometric situations such that motions cannot be controlled by the 
actuators (e.g. piston and crank system when the crank is fully extented 
or retracted). In other terms, the actuated joint velocities vanish for finite 
motions of the mechanism. 

It is possible to classify the singularities of parallel manipulators into three 
types [5]: the singularities of the sheet separating type when one of the serial 
sub chain of the mechanism is singular — loss of mobility — ; the singularities 
of Hunt type — loss of controllability — see Figure 8, or both. The third case 
occurs only for special configurations which cause two singularities to meet, 




Fig. 7. fa) Geometry of WAMS. 

(b) Link-1 cannot rotate 
around axis Y. 




148 



which can be avoided by design. The important observation that we will use 
in the next section is that the loss of controllability for a parallel mechanism 
occurs in general inside its workspace, but may retain mobility in large portions 
of workspace. Of course, biological manipulators do not escape this laws. 

The human shoulder for 
instance has a very large 
workspace. Its large con- 
trollability region can be 
attributed to redundancy 
in actuation and this will 
lead us to utilize a similar 
method to eliminate sin- 
gularities of Hunt type in 
large regions of a parallel 
manipulator. The results 
of the above discussion are 
now utilized to formulate 
the design of a mechanism 
that does not display sin- 
gularities in large portion 
of its workspace. 

5 Kinematic Synthesis 




Fig. 8. 



o Actuated Joint 
• Free Joint 

(a) Loss of controllability. 

The platform can undergo small 
rotations while the actuators’ 
velocities vanish. 

(b) Loss of mobility. The platform 
is only able to rotate. 



Consistent with the goal to achieve a large workspace and limited spatial 
intrusion, it seems difficult to avoid the general architecture which consists 
of two elongated links assembled by a revolute joint. Such a manipulator, 
using a three re volute joints assembly at each end, was first described in the 
70 ’s by Takase, Inoue and Sato [19] and later discussed by Hollerbach [9]. As 
shown by Yoshikawa [22] , its kinematic decoupling simplifies enormously many 
aspects of the control, in particular when the task prescribes the hand motions 
while collisions need to be avoided. Nevertheless, as commented before, this 
architecture still possesses “hole creating” singularities which defeat some of 
its advantages. 

In addition, such a ma- 
nipulator requires to cas- 
cade seven joints which 
makes it difficult to ob- 
tain good dynamics and 
accuracy. Following Na- 
ture’s example, it seems 
possible to achieve a sim- 
ilar amount of workspace, 
but using at each end two 
parallel type mechanisms. 

This leads to the general 
architecture on Figure 9. 




Fig. 9. General architecture. 



149 



Such an architecture can only be useful if a sufficient amount of workspace can 
be obtained from these parallel joints. We have seen that parallel mechanisms 
made it difficult to achieve good controllable workspace. The main point is 
that a major source of workspace limitation in parallel mechanisms is due to 
Hunt type singularities. In fact, this difficulty can be overcome using once 
again inspiration from biological joints. 

For example, the shoulder joint has a large number of muscles to control it. 
In certain positions, it is clear that some of these muscles cannot contribute 
to certain motions, but the overall joint is assembled in such a way that when 
some muscles loose their influence on the output, there are always others to 
supplement them. 

This idea can be readily used in par- 
allel mechanisms. If we look at a sim- 
ple arrangement of a spherical mecha- 
nism, Figure 10, it displays a debilitat- 
ing singularity right in the middle of its 
workspace. Because of the underlying 
topological properties of its kinematic 
map, this does not depend on the ge- 
ometry of the mechanism. Regardless 
of the placement of the actuators, it will 
always exist. Now consider again a pla- 
nar type parallel manipulator as shown 
on Figure 11. 

In the middle of its workspace, the addition of one actuator supplements 
the loss of controllability. In fact, we have shown that the addition of only 
one actuator can remove Hunt singularities from a very large portion of the 
work space from our initial design. The mathematical details of the proof 
are beyond the scope of this paper, but can be found in [7]. The arrangement 
shown on Figure 12 possesses a useful range of motion with no self-interference 
of parts and high and smooth dexterity in the range: 120° x 180° x 270°. 
Once physical considera- 
tions such as the size and 
stroke of actuators are 
taken into account these 
figures may reduce some- 
what. Nonetheless, we 
have constructed an hy- 
draulically actuated pro- 
totype which exhibits a 
100° x 100° x 180° useful 
range. If desired, it can 
even be made isotropic, 
that is optimally dextrous, 
for several configurations. 








Fig. 11. (a) Hunt type singularity. 

The reader might agree that 
it is hard to resist the idea 
of adding one actuator as in (b). 





150 



The measure of dexterity is based on the condition number of the Jacobian 
matrix of the kinematic map [15]. It has several physical interpretations in- 
cluding mechanism accuracy and a measure of quality for the transmission of 
forces and velocities from actuators coordinated to output coordinates. Details 





Fig. 12. General concept of the actuator redundant wrist and illustration of 

its dextrous workspace. 



6 A Complete Arm 

The integration of the spherical mechanism into a complete arm design will 
achieve the goal of a creating an arm with limited seriality (three links) and 
kinematic redundancy as seen from the task (seven freedoms to provide for 
self-motion that is finite motions with hand fixed). 




This design follows closely Fuller’s Tensengrity Principle. 






151 



Parallel actuation will lead to high bandwidth and rigidity as well as providing 
the basis for elaborating a truss assembly. In addition, this manipulator has no 
“hole creating” singularities since no re volute joints can align, nor Hunt type 
singularities within a reasonably large workspace. The only singularity left 
corresponds to the limit of the position workspace when the arm is completely 
stretched. See Figure 13 for a sketch of the design concept of this arm. 

Of course there as many possible variations around this theme. In partic- 
ular, it would be particularly interesting to de-locate the actuators of distal 
links. Some notable successes in this area have already been achieved [17, 21]. 

7 Conclusion 

It has been argued that throughout the history of technology, analogies with 
biological systems have successfully lead to insights into innovative designs. 
Many papers in this proceedings will certainly add weight to this idea. 

Introspection then has been used to describe a “design process” directed 
by analogies with biological manipulators aimed at proposing a novel type 
of robot manipulator which is realizable with existing technology and which 
possesses a number of desirable properties. 

References 

[1] Buckminster Fuller, R. 1985. Inventions: The patented works ofR. Buck- 
minster Fuller , St. Martin Press. 

[2] Burdick, J. 1989. Kinematic analysis of redundant manipulators: A topo- 
logical perspective. In Robots with redundancy : designy sensing and con- 
trol NATO Series, A. Bejczy (Ed.), Springer Verlag, in press. 

[3] M. Chanchi 1984. Les Machines de Leonard de Vinci , Becocci Editore, 
Milano (translated from Italian). 

[4] Goertz, R. C. 1963. Manipulators used for handling radioactive material. 
Human Factors in Technology , Chapter 27, edited by E. M. Bennett, 
McGraw-Hill. 

[5] Gosselin, C., 1988. Kinematic analysis, optimization and programming of 
parallel robotic manipulators. Ph.D. Dissertationy Dept, of Mech. Eng., 
McGill University. 1988. 

[6] Hayward V. An analysis of redundant manipulators from several view- 
points. In Robots with redundancy: design, sensing and controly NATO 
Series , A. Bejczy (Ed.), Springer Verlag, in press. 

[7] Hayward, V., Kurtz, R. 1991. Modeling of a parallel wrist mechanism 
with actuator redundancy. In Advances in Robot Kinematics. S. Stifter 
and J. Lenarcic (Eds). Springer Verlag, 1991. 

[8] Dietrich, J., Hirzinger, G., Gombert, B., and Schott, J. 1989. On a uni- 
fied concept for a new generation of light-weight robots. In Experimental 
Robotics Iy V. Hayward, 0. Khatib, (Eds.), Lecture Notes in Control and 
Information Science 139, Springer Verlag. 




152 



[9] Hollerbach, J. 1985. Optimum kinematic design for a seven degree of 
freedom manipulator. In Robotics Research: The Second International 
Symposium , H. Hanafusa and H. Inoue (Eds.), 1985, MIT Press. 

[10] Hunt, K. H., 1983. Structural kinematics of in-parallel-actuated robot 
arms. ASME, J. of Mechanisms, Transmission, and Automation in Design, 
Vol 105, pp. 705-712. 

[11] Jacobsen, S. C., Iversen, E. K., Knutti, D. F., Johnson, R. T., Biggers, K. 
B. 1986. Design of the UTAH/MIT dextrous hand. IEEE Conf Robotics 
and Automation. 

[12] Kurtz, R., Hayward, V. 1992. Multi-goal optimization of a parallel mech- 
anism with actuator redundancy. IEEE Transactions on Robotics and 
Automation . Vol. 8, No. 5. 

[13] Long, G. L., Paul, R. P. 1988. Avoiding orientations singularities with a 
four-revolute-joint spherical wrist. The Second Workshop on Manipula- 
tors, Sensors and Steps Toward Mobility. Salford, England. 

[14] Phillips, J. 1984. Freedom in Machinery, Vol . 1: Introducing Screw The- 
ory , Cambridge University Press, Cambridge. 

[15] Salisbury, J. K., Craig, J. J. 1982. Articulated hands: force control and 
kinematic issues. The int. J. of Robotics Research,, Vol. 1, No.l. 

[16] Salisbury, K. 1987. Whole arm manipulation. In Fourth Int. Symposium 
on Robotics Research, R. C. Bolles and B. Roth (Eds.), MIT Press. 

[17] Salisbury, J. K., Townsend, W. T., Eberman, B. S., DiPietro, D., 1988. 
Preliminary Design of a Whole-Arm Manipulation System (WAMS). proc 
1987 IEEE Int. Conf. Robotics and Automation, Philadelphia, PA. 

[18] Simon, H. A. 1985. The sciences of artificial, MIT Press. 

[19] Takase, K., Inoue, H., and Sato, K. 1974. The design of an articulated 
manipulator with torque control ability. Fourth International Symposium 
on Industrial Robots. 

[20] Ulrich, N., Paul, R.P., Bajczy. R. 1988. A medium-complexity compliant 
end effector. IEEE Conf. Robotics and Automation. 

[21] Vertut, J., Marchal, P., Debrie, G., Petit, M., Francois, D., Coiffet, P. 
1976. The MA 23 bilateral servomanipulator system. Proc. of the 24th 
Conf. on Remote Systems Technology, Washington, pp. 175-187. 

[22] Yoshikawa, T. 1984. Analysis and Control of Robot Manipulators with 
Redundancy. Robotics Research: The First International Symposium , M. 
Brady and R. Paul (Eds.)., MIT Press. 




Mechanical Design for Whole-Arm Manipulation 



Wiliam T. Townsend and J. Kenneth Salisbury 

Artificial Intelligence Laboratory 
Massachusetts Institute of Technology 
Cambridge, MA 



Abstract 

This paper describes the performance requirements and mechanical design of an 
arm designed and built at MIT for whole-arm manipulation. Whole-arm 
manipulation began as a research objective to explore the benefits of manipulating 
objects with all surfaces of a robotic manipulator — not just the fingertips of an 
attached robotic hand. The need for robust environment contact by all surfaces of 
the robotic hardware prompted a re-evaluation of traditional manipulator design 
requirements and spurred the invention of new transmission mechanisms for 
robots. 



1 Introduction - Whole-Arm Manipulation 

Salisbury [Salisbury 87, Salisbury 88] introduced the concept of whole-arm manipulation 
(WAM) to address a broad range of tasks. As a tool for WAM experiments, Townsend 
[Townsend 88A] designed and built the WAM manipulator, intended for contact and inter- 
action with the environment by using any of its link surfaces. Conventional manipulators, 
by contrast, must contact the environment with only the inside surfaces of an attached 
gripper or the fingertips of an attached hand. Often it is useful, if not inevitable, to 
contact the environment with other parts of the arm. 

There are numerous examples where whole-arm manipulation is important. Obstacles 
that today’s robots try to avoid can be used for leverage [West 87] or to guide a robot 
toward its goal. Furthermore, a human may use his shoulder for mechanical advantage to 
budge a heavy box, or he may carry firewood between his upper and fore arms by using 
these limbs as force-controlled grippers. It’s hard, for example, to imagine an Olympic- 
style wrestler winning his match by using only his finger tips! He must control, with 
tremendous strength, speed, and agility, positions and forces along many parts of his 
body simultaneously. 

Robust, high-performance force control is important to WAM since the system is 
intended to control contact forces between objects in the environment and any part of 
its mechanism. This is accomplished by controlling joint torques directly and inferring 
contact forces rather than measuring them explicitly through a wrist sensor [Salisbury 
86]. Figure 1 shows the simplest example of this method of inference. Joint torques n 




Figure 1: Determining force location and magnitude. 



and r 2 are applied (and known) by the controller, and L is the length of the inner link. 
The equilibrium equations of the two joint torques provide two independent equations for 
solution of the location of a single contact, x , and a perpendicularly applied contact force, 

'' 

and 

x = - r ?- " ■ (2) 

n - t 2 

With four independent degrees of freedom we can determine the magnitude and line of 
action of a single contact force on the last link. 



2 Special Requirements for Whole- Arm Manipulation 

Whole- arm manipulation tasks require a re-evaluation of the performance requirements 
which guide the design of robotic hardware. This section describes five requirements 
which helped guide the design of the WAM arm. 

2.1 Large Dynamic Range of Force Controllability 

The dynamic range of force controllability is the maximum controllable force (strength) 
divided by the minimum controllable force (accuracy) of the manipulator at a single point 
of contact. This ratio is useful in maximizing the range of tasks a particular manipulator 
can perform. One can increase without bound the maximum controllable force of a manip- 
ulator by giving it higher-torque actuators and bigger, stronger links. Similarly, one can 
achieve very small controllable forces by building a small, light manipulator with lightly 
preloaded bearings and sensitive force sensors. However, intricate tasks often demand the 
application and sensing of a broad range of forces. 

Dynamic range of force control is selected in place of either strength or accuracy 
in order to address task performance more directly. This measure is dimensionless and 
independent of scale. 

Since dynamic range is limited by the maximum and minimum controllable forces, we 
examine each of these separately. The maximum controllable force is limited by the motor- 
torque saturation limit times the transmission ratio, the strength of the transmission, and 





155 



the strength of the links. In a system without explicit force feedback, the minimum 
controllable force is limited by torque ripple, dry friction, and deadband in the motor 
controller. In a system with force feedback, limit cycles arising from the combination 
of highly nonlinear elements and feedback control often limit the minimum controllable 
force. 



2.2 Robustness 

The manipulator must be robust. We attach a broader definition to the word “robust” 
than do researchers of systems and controls. By robust we mean that the manipulator be 
able to perform its tasks reliably and without suffering damage. A robust manipulator 
then must be 

• dynamically stable and 

• mechanically durable 
under all conditions. 

Dynamic Stability: 

Maintaining dynamic stability is the type of robustness most commonly referred to in 
systems and controls. This sense of robustness requires that a dynamically stable manip- 
ulator remains dynamically stable in the face of changing inputs, disturbances, payload, 
contact stiffness, and arm configuration. Advanced techniques such as sliding-mode con- 
trol have been developed which deal directly with parameter uncertainty [Slotine 84]. 
Mechanical Durability: 

To maximize mechanical durability, on the other hand, we want to minimize any impact- 
induced force and to minimize fragility so that the manipulator can bash around, exploring 
an uncertain environment without damaging itself. We can improve the survivability 
of the manipulator by using tough materials, tucking fragile transmissions and sensing 
mechanisms inside the load- carrying structure, adding protective coverings around the 
links and joints, and minimizing the forces of impact. 

By equating kinetic energy before collision to potential energy during collision, we find 
that the maximum impact force, F Cimpact , is 



Feimpact — v i \f J i (3) 

where 

Ji is the inertia of the link measured at the point of contact, 
k c is the contact stiffness, and 
vi is the contact velocity. 

In order to minimize impact forces, the designer must minimize the backdriven mass, 
increase the contact compliance (perhaps with a soft covering), and limit velocity of the 
moving mass. 

2.3 High Bandwidth 

High bandwidth of force and position control is important in manipulators used for assem- 
bly, where the cycle time of tasks is critical. Bandwidth is also important for controlling 




156 



forces against shaking and undulating environments. For position-controlled manipu- 
lator designs, maintaining high bandwidth may be an unconscious decision; but, since 
some designers believe that force- controlled manipulators should be naturally compliant 
[Nevins 73, Andeen 88], the danger exists for their resulting designs to exhibit significantly 
lower mechanical bandwidth. 



2.4 High Aspect Ratio 



We define aspect ratio of the link as its length, L, divided by its width, W. When the 
aspect ratios are high, the links are long and slender. In all manipulators, increasing the 
aspect ratio increases the unobstructed workspace and allows the manipulator links to 
reach in and around obstacles in the environment more easily. 

High-aspect-ratio links are better at grasping and manipulating as shown in figure 2. 
This figure illustrates a serial-link manipulator trying to grasp a cylindrical object of 
diameter, D cy /, between consecutive links of length, L. In each case the link width, W, 
and the coefficient of friction, //, between the object and the cylinder, are the same and 
determine the friction-cone angle and the maximum permissible joint angle, each equal 
to 0 , which allows a secure grasp. The useful grasping length, L', normalized by the link 
length, L, is 



L V Ltanf’ 

and the largest cylinder that the pair of links can grasp is 



( 4 ) 



= 2/i — — . 

L M L 



( 5 ) 



Therefore, the grasp length, L', and the largest diameter cylinder that can be grasped 
are dependent directly on the aspect ratio, — , and are both maximized when the aspect 
ratio is made as large as possible. 

Many design considerations affect aspect ratio. For example, the decision to use a 
compact transmission in order to remove actuator bulk from the joint to the base or just 
back a few links improves aspect ratio. When a not-so-compact, single-stage reduction is 
used, the diameter of the final drive pulley or gear at the joint or the length of the output 
link of a four-bar linkage, if made large to increase the effective transmission stiffness, 
transmission ratio, and/or joint strength, decreases aspect ratio. When a more-compact 
multiple- stage reduction increases torque at the joint, aspect ratio is improved over the 
single-stage reduction at the cost of lower power efficiency [Townsend 88B] and higher 
complexity. 



2.5 Good Backdrivability 
There are two types of backdrivability: 

• acceleration-dependent and 

• velocity- dependent. 




157 




Backdrivability is measured in Cartesian coordinates at a fixed point on the manip- 
ulator, usually the endtip. A mechanism which has good acceleration-dependent back- 
drivability generates only small inertia- induced contact forces when accelerated by the 
contact. The backdrivability of a single link is improved by minimizing the link-structure 
inertia and then keeping the transmission ratio smaller than the matched-inertia trans- 
mission ratio so that the reflected motor inertia remains relatively small. 

Similarly, a mechanism which has good velocity- dependent backdrivability generates 
small friction-induced forces in response to imposed endtip velocities. It is commonly 
known that a transmission mechanism which uses worm gears and has dry friction will 
not be backdrivable at all if the pitch angle of the worm gear is less than the friction-cone 
angle. It is worth noting here that, if only power efficiencies are available to compare the 
quality of competing transmission designs, then in many cases the highest-efficiency drive 
will provide the best velocity-dependent backdrivability. 

The concept of designing manipulators for good backdrivability is borrowed from high- 
quality teleoperator design where high backdriven inertia and friction in the master/slave 
system would mask the transmission of forces in bilateral force reflection. Also, isotropy 
in the backdriven inertia and friction improves teleoperator performance by reducing 
the disparity between the desired and achieved motions. Some manipulator designers as 
well have begun to design for good backdrivability. For example, in order to simplify 
the dynamic equations for calculating the actuator torques in the trajectory control of a 
direct-drive arm, Asada [Asada 84] redesigned a manipulator so that the inertia properties 
at its endtip would be nearly isotropic over a large portion of its workspace. 

Good backdrivability causes the manipulator to behave desirably without dependence 
on closed-loop control. If closed-loop control is used, system accuracy can be improved. 
If open-loop force control is used and the manipulator is backdrivable over a practical 
bandwidth, then forces which are applied to the manipulator are “sensed” at the actuator 
without the need for endtip sensors. In effect, the distinction between actuator and sensor 






Figure 3: WAM arm exploring foam blocks. 



vanish. Furthermore, the good backdrivability means that the impulse force generated 
upon impacts will be smaller and a manipulator will be naturally robust to collisions and 
impacts by lowering the effective back driven mass of the link, J\ of equation 3. 

3 Description of the WAM Design 

This section describes the design of MIT’s WAM arm and the results of initial experiments. 
3.1 Mechanical Design 

Figure 3 shows a photograph of the WAM arm. It has four revolute degrees of freedom 
(without a wrist): three intersecting joints at the base and one distal joint located 0.6 me- 
ters from the base. The two non- zero-length, cylindrical links have a combined reach of 
1 meter and mass of 4 kg (including the distal joint). The mass of the arm including base 
and motors is 35 kg. All four joint axes have a range of three-quarters of a revolution. 

The arm uses stiff, backdrivable, multi-stage, cabled transmissions between the com- 
pact joints and the four Moog brushless DC motors located in the base. Joint positions 
are inferred from 12-bit-resolution resolvers mounted on each motor shaft. Torques are in- 
ferred from Hall-effect sensors which measure the motor-winding currents. By measuring 
positions and torques at the motors the stability problems associated with noncolocated 
systems are avoided. 

The transmission reducers are located directly at the joints they drive, for maximum 
stiffness, as explained later in this section, instead the standard practice of placing them 
at the motor in the form of a gearhead. The transmission ratios are sized so that the 
mechanical advantage is large but so that the backdriven motor inertia is negligible com- 
pared to a 0.1-kg payload. Special split-pulley designs allow single-point pretensioning 
with a pretensioning-propogation scheme which automatically sets the correct pretension 
in all stages. A novel cabled differential allows the actuators to be placed closer to the 
base while maintaining backlashless, efficient, and stiff mechanical- power transmission. 




159 



Even where volume would have permitted direct-drive motors, such as in the first-axis 
drive, relatively small motors with speed-reducing cable transmissions were used to im- 
prove torque ripple, backdriven inertia, and the effective motor constant while reducing 
cost and weight. 

The links themselves are long and slender and are covered with a 3-mm-thick dense 
foam to tailor the contact characteristics for manipulating objects. Both links are tubular: 
the inner, aluminum link is designed to absorb large local impacts with its 5-mm thickness 
and small 38-mm radius of curvature (cylinder radius); the outer, carbon-fiber/ epoxy link 
is more tailored for low mass while its 25-mm radius of curvature, 2-mm thickness, and 
foam covering afford it ample toughness for impacts. In order to meet these toughness 
constraints, the arm is many times stronger than it must be to lift its maximum 2-kg mass 
payload against gravity and many times stiffer than appropriate backdrivable servomotors 
and servomotors with transmissions can be. The links themselves are modular so that, 
for example, the outer link could be replaced by an actuator pack to drive a wrist and 
gripper. 

A channel is provided for instrumentation and additional pneumatic or electric power 
line routing from the base to the endtip of the distal link. The manipulator’s mounting 
is simple and requires two floor-mounted anchor bolts. The motor power supplies and 
analog current amplifiers are located 3-meters away in a separate electronics box. 

3.2 Experimental Results 

Initial experimental results from several hundred hours of tests have been reported by 
researchers [Niemeyer 89, Salisbury 89] at MIT’s Artificial Intelligence Laboratory and 
Nonlinear Systems Laboratory. All of the experiments were performed without a wrist, 
end-effector, or other payload. In free trajectories, by using adaptive control, endtip 
speeds have exceeded 10.8 meters/ sec with accelerations of 134 meter/ sec 2 (13.7 g’s). 
Using only the four motor resolvers to estimate position, the endpoint repeatability of the 
arm is ±1 mm. When applying contact forces, the maximum vertical force which has been 
applied by the arm when outstretched horizontally is 8 kg/. The force resolution with 
motor-torque ripple and cogging compensation is ±0.2 kg/. The stiffness of the fourth 
joint, which has the longest transmission span (0.6 meters), is 1800 newton-meter/sec 
with the motor mechanically locked. The highest controlled stiffness of the fourth joint is 
100 newton-meter/ sec, limited by the maximum stable motor- controller gain. 

4 Novel Transmission Mechanisms for Whole-Arm Manipula- 
tion 

This section describes two cabled transmission mechanisms developed specifically for the 
WAM arm. 

4.1 The Choice of Cable Drives 

When properly designed, cable drives have high material strength, low weight, low velocity 
and torque ripple, no backlash, and low friction. Furthermore, they do not leak, do not 
require surface lubrication, and can be guided over long distances around pulleys through 




160 




Figure 4: Differential mechanism on the WAM arm. 

complex and twisting geometries. Cables and all other tension-element drives, such as 
tapes and belts, do not transfer power through compression or shear, and so avoid added 
compliance and strength limitations from bending moments or buckling. When designed 
for reliability, cable drives have a history of dependability in such demanding applications 
as aerial trams, ski lifts, cable cars, light- aircraft control surfaces, cranes, and elevators. 

4.2 Cabled Differential 

Figure 4 is a photograph of a cabled differential (which was invented for the design of this 
arm) as integrated in the design of the WAM arm. Figure 5 shows details of the cable- 
differential concept. Unlike traditional bevel-gear differentials, the cabled differential has 
only rolling contact (cable- to-pulley) and so has extremely low friction without the need for 
surface lubricants. The tooth- frequency torque ripple, gear noise, and backlash normally 
associated with traditional differential drives are eliminated. Also, fabrication is simple, 
requiring only concentric steps to be lathed without re-chucking the workpiece. Finally, 
the design is also extremely stiff because there is zero free length of cable as it unwinds 
from one pulley and immediately winds onto its mating pulley. 

4.3 High-Speed Transmission with Specially Designed Speed Reducer 

Commonly a speed-reducer mechanism is included in the transmission to boost the actu- 
ator torque capacity and the effective motor constant of a small-but-fast actuator by the 
magnitude of the speed reduction. Conventionally, the reducer is located at the motor 
shaft and the mechanical designer only selects its magnitude, N, (also called transmission 
ratio) which is the ratio of motor speed to joint speed. However, the designer is free to 
select both its location and its magnitude! Figure 6 shows the model of a transmission 
with a speed reducer located arbitrarily between the motor and the joint it drives. Let the 
distance between the motor and joint be L and the distance between the motor and the 
reducer be C . The cable is sized for a given maximum stress so that an N-times heavier 
cable is required in the low-speed tension-element. 




161 




Figure 5: Differential cabling concept. 



In many types of transmissions, such as tension-element drives, both stiffness and 
strength are proportional to the tension (or compression) cross-sectional area. Suppose 
the cross-sectional areas for the high- and low-speed parts of the transmission are selected 
so that the stress is constant along the transmission. We find that, although the element 
in the high-speed part of the transmission is lighter than that in the low-speed part by 
N, the effective stiffness of the high-speed part for a given transmission length is greater 
by N. Furthermore, the effective transmission stiffness, fc e //, for the transmission model 
of figure 6, measured at the joint, is 



k e ff — 



N 2 EA 



NL + C{\ -N)’ 



( 6 ) 



where L is the transmission length, E is the modulus of elasticity of the transmission- 
element material, and A is the cross-sectional area of the high-speed part of the transmis- 
sion. The effective transmission stiffness of equation (4.1) is maximized by letting C = T, 
i.e., by placing the reduction mechanism at the joint, so that the high-speed part of the 
transmission spans the entire distance between the actuator and joint. 

The benefits of placing the reducer at the joint would be lost if we used a geared 
reducer that was significantly more bulky or more heavy than the joint alone. For this 
reason, we developed the compact, light-weight, cabled speed-reducer shown integrated 
with the fourth joint of the WAM arm in figure 7. The mechanics of this design are 
illustrated in figure 8. 

Placing the reducer at the joint has several benefits. First, since the stiffness is greater, 
trajectory and contact force bandwidths are increased. The greater stiffness also improves 
closed-loop force-control stability [Townsend 87]; and, because the transmission operates 
with higher tension-element speeds, the power efficiency [Townsend 88B], and therefore 
the backdrivability, are improved. 






162 



high-speed 
tension element 



speed low-speed 

reducer tension element 




environment 



Figure 6: Choosing the speed-reducer location. 



5 The Future 

Although we are presently focusing on the next mechanical- design challenges of whole-arm 
manipulation, such as the torque-output quality of the drive motors and the integration 
of a WAM-style wrist and hand, it is important to consider the broader implications of 
whole- arm manipulation. 

Research is needed to explore the new possibilities for intelligent manipulation. Since 
high-performance transmissions, driven by actuators of comparable quality, reduce the 
reliance on endtip-force feedback, high-level force-control schemes based on a vector of 
joint torques become practical. New strategies such as controlling large forces near the 
base of a link and controlling small-but-accurate forces near the tip of a link should be 
considered. Similar strategies could be employed to vary the effective contact impedance 
passively by choosing the point of contact along a link. 

Appropriate kinematics must be considered to maximize the observability and con- 
trollability of forces for exploring uncertain environments and for performing a variety of 
manipulation tasks such as grasping between adjacent links and controlling line contacts. 
There are many human tasks which involve capabilities only now becoming achievable in 
robot hardware. In soccer, the ball is guided by a contact between it and the edge of 
the player’s shoe. A person gathers a scattered collection of objects between her arms. 
The Olympic-style wrestler controls forces with tremendous speed and agility along many 
parts of his body simultaneously. Without vision, the edges of a box are explored quickly 
by groping with arms and hands. The capabilities which enable these human tasks will 
also enable tasks more appropriate for robotic hardware. As researchers and designers, 
we must expand our expectations for force control and begin to explore these entirely new 
horizons as robots interact more with their environment. 






163 




Figure 7: Speed reducer located at the fourth joint of the Whole- Arm Manipulator. 
References 

[1] Andeen, G.B., and Kornbluh, R., “Design of Compliance in Robotics,” Proceedings 
of the 1988 IEEE International Conference on Robotics and Automation, Philadelphia, 
PA, April 1988. 

[2] Asada, H., “Dynamic Analysis and Design of Robot Manipulators Using Inertia El- 
lipsoids,” Proceedings of the First International Conference on Robotics, Atlanta, GA, 
March 1984. 

[3] Nevins, J.L., and Whitney, D.E., “The Force Vector Assembler Concept,” Pro- 
ceedings of the First CISM-IFToMM Symposium, Udine, Italy, 5-8 September 1973, 
pp 273-288. 

[4] Niemeyer, G., and Slotine, “Computational Algorithms for Adaptive Com- 

pliant Motion,” Proceedings of the 1989 IEEE International Conference on Robotics and 
Automation Scottsdale, AZ, May 1989. 

[5] Salisbury, J.K., “Teleoperator Hand Design Issues,” Proceedings of the IEEE Inter- 
national Conference on Robotics and Automation, San Francisco, CA, April 1986. 

[6] Salisbury, J.K., “Whole-Arm Manipulation,” Proceedings of the 4th International 
Symposium of Robotics Research Santa Cruz, CA, August 1987. 

[7] Salisbury, J.K., Townsend, W.T., Eberman, B.S., DiPietro, D., “Preliminary 
Design of a Whole-Arm Manipulation System (WAMS),” Proceedings of the IEEE Inter- 
national Conference on Robotics and Automation, Philadelphia, PA, April 1988. 

[8] Salisbury, J.K., Eberman, B.S., Townsend, W.T., and Levan, M.D., “Design 
and Control of an Experimental Whole- Arm Manipulator,” Proceedings of the 1989 
International Symposium on Robotics Research, Tokyo, Japan, 1989. 




164 




Figure 8: Mechanics of the remotely-located speed-reducer design. 

[9] Slotine, J.J., “Sliding Controller Design for Nonlinear Systems,” International Journal 
of Control, Vol. 40, No. 2, 1984. 

[10] Townsend, W.T., and Salisbury, J.K., “The Effect of Coulomb Friction and Stiction 
on Force Control,” Proceedings of the 1987 IEEE International Conference on Robotics 
and Automation, Raleigh, NC, April 1987, pp pp. 883-889. 

[11] Townsend, W.T., “The Effect of Transmission Design on Force-Controlled Manipulator 
Performance,” MIT PhD thesis published as AI-TR-1054 by the Artificial Intelligence 
Laboratory, Cambridge, MA, April 1988. 

[12] Townsend, W.T., and Salisbury, J.K., “The Efficiency Limit of Belt and Cable 
Drives,” ASME Journal of Mechanisms, Transmissions, and Automation in Design, 
Vol. 110, No. 3, September 1988, pp 303-307. 

[13] West, H., “Kinematic Analysis fo the Design and Control of Braced Manipulators,” 
MIT PhD thesis, Department of Mechanical Engineering, Cambridge, MA, June 1987. 






Whole-Hand Manipulation: 

Design of an Articulated Hand Exploiting All Its 
Parts to Increase Dexterity 



Gabriele Vassura* and Antonio Bicchi ** 

* DIEM - Dipartimento di Ingegneria delle Costruzioni Meccaniche, Nucleari, Aeronautiche e 
di Metallurgia, Universita* di Bologna, Viale Risorgimento, 2 , 40136 Bologna, Italy 

** Centro ’’E.Piaggio”, Facolta’ di Ingegneria, Universita di Pisa, Via Diotisalvi, 2 56100 
Pisa, Italy. 

Abstract 

It is a common observation that the human hand performs various manipulation tasks using 
not only its fingertips, but all the surfaces available for contact, i.e. intermediate phalanges 
and the palm. In the first part of this paper some such whole-hand operations are discussed, 
relating to different domains of fine manipulation such as grasping, exploration and micro- 
motion of objects. 

The design guide-lines deduced by the analysis of whole-hand manipulation operations in 
humans are outdrawn in order to reproduce a similar behaviour in a robotic hand: re- 
quirements on mechanical architecture (to provide proper surface conformation and opposition 
of hand elements) and on sensory equipment (to allow the synthesis of satisfactory control 
procedures) result from this analysis. 

A second part of the paper describes how these issues can be implemented in the version II 
U.B. Hand, currently under development. The propensity of version I kinematic architecture 
to whole-hand manipulation is exploited by integrating in the mechanical structure purposely 
designed force/torque sensors, according to the intrinsic tactile sensing concept: the external 
surface of each phalange in the fingers and that of the palm thus become integral parts of as 
many sensing devices. 

The final part of the paper provides preliminary suggestions on how to use the proposed 
hand to perform some tasks requiring whole-hand manipulation. 

Introduction 

It can be observed, in many robotic applications, that the potential functionality of existing 
robotic devices is seldom fully exploited, often due to limitations in sensory equipment or 
control procedures, but sometimes also to limits in their original conception. 

As an example, most present robots are designed to interact with the environment through 
their end-effector, thus limiting the range of possible operations and objects the robot can deal 
with; the whole arm manipulation concept, involving the use of most parts of the robot arm to 
accomplish an enlarged set of tasks, was only recently proposed by [Salisbury,87], and 
preliminary applications are being presently demonstrated. 

In the field of articulated robot hands, the one this paper is concerned with, a parallel can 
be easily drawn with the above example: most present robot hands are designed (or at least are 
used as if designed) for manipulating objects using only their fingertips, while the human 
hand performs various manipulation tasks using all the surfaces available for contact, i.e. 




166 



fingertips, intermediate phalanges and the palm. This results in more powerful grasps, finer 
control of object motion, or better sensory information, which leads to enhanced dexterity of 
the hand. 

In practice, the performances of some articulated hands, in spite of their complex and 
expensive multi-dof mechanical structure, are not so far from those of simpler and cheaper 
grippers. It is authors’ opinion that some design criteria need to be revised in view of more 
effective mechanical and sensory equipment integration and that useful results can be obtained 
if the means for full exploitation of hand elements are provided in the design phase. 

Borrowing the term from Salisbury, by whole hand manipulation (WHM) we mean that all 
the links of the multi-D.O.F. kinematic chain of the hand can be used to contact and sense the 
object. As in the whole arm manipulation case, the WHM concept has been derived from 
observation of a biological system, the human hand, but design solutions are not necessarily 
anthropomorphic. 

The goal of the work reported in this paper is to realize an artificial hand that can perform 
some WHM operations. In order to do this, three main aspects have to be developed: i) the 
hand design must allow for suitable kinematics, ensuring proper mobility and opposability of 
hand’s elements; ii) sensors must be integrated in all the parts of the hand that are used to 
contact manipulated objects, and iii) sensory control methods have to be developed to 
guarantee the necessary degrees of flexibility and adaptability to unpredictable environments. 
Although the main stress of the paper is on the design of the mechanical and sensory 
components, other aspects of the project will be addressed. 

The implementation of the concepts discussed in this paper is being currently carried out: a 
prototype finger, suitable for whole hand manipulation, has been built and is described in this 
report. Previous experience with designing and testing the version I UB Hand, providing the 
basis for the mechanical arrangement of the newly proposed one, will be also briefly 
explained. 



Examples of whole hand manipulation in human activity 

The functionality of the human hand has been widely investigated in its various aspects; 
[Schlesinger,1919] [Keller, 1947] [Tubiana,1981]; it is however intuitive to verify how 
frequently each part of the hand (the palm, the phalanges and the fingertips) gets into contact 
with the objects. In the following, some cases of human whole hand manipulation are 
commented on in order to extract suggestions for robotic hand design. 

The first example (see Fig. la,b,c,d) refers to a typical pick and place task for objects of 
similar shape (a rectangular prism) but different size and mass. Different grasp configurations, 
each one involving more contacts of larger area, are used by the human hand in order to 
improve stability. 




Fig. 1 Four grasps with different extension of contact surface 




167 



The progressive involvement of further structural elements of the hand (the inner phalanges 
and the palm) leads to increased strength of the grasp against the weight of the body being 
lifted . 

The second example relates to a task which requires a "power grasp” of a tool in a con- 
straining environment. The tool must be initially grasped in a configuration which is 
compatible with the constraints (the hammer is lying on a plane), and then moved inside the 
hand towards a final grasp configuration which is suitable for the task accomplishment. This 
case is very common when operating in an unstructured environment, where many objects 
have limited accessibility for grasping; this fact imposes initial grasp configurations different 
from those required by the task. Another typical example is picking up a pencil in fingertip 
prehension, and then manipulating it to the final configuration of fig. 3. The hand operates a 
first grasp acting on the available surface of the object, typically in fingertip mode (fig. 2a), 
then partially lifts it, while the object is forced to move through a number of intermediate 
configurations by controlled slipping or rolling or by fingers relocation (Fig. 2b). Once the 
final configuration (Fig. 2c) has been reached, the power grasp of the tool and the task 
accomplishment become possible. This example shows how the whole surface of the hand is 
used not only in final constraining, but is also crucial to implement internal manipulation 
procedures. 




Fig. 2 Manipulation before final grasp 

Finally, a third example relates to a fine manipulation task, which consists of holding a pen 
and writing. The pen, Fig. 3, is usually held in a four contact grasp with a lateral contact on 
the medium fingertip (A), two contacts on the index finger, fingertip pad (B) and lateral 
surface of the proximal phalange (D), one contact on the thumb fingertip (C). The task of 
writing along a line is a combination of transverse motion of the pen tip, obtained by fine 
motion of fingers, and line motion of the hand, achieved by moving the wrist or even the arm. 
It is interesting to note that in points A, B, C no slippage or rolling usually occur and small 






168 



motion between the fingers and the pen is allowed by the compliance of pads, while in contact 
D slippage is frequent. It is also relevant to 

note that the stability and the precision of the grasp greatly depend on using contacts on the 
lateral surface of fingers. 

The observation of the biological model proposes useful suggestions and encourages to 
implement them in robotic hands design. In the following section, some consequent design 
issues will be presented. 

Design issues and requirements 

In order to reproduce some of the capabilities of the human hand, including whole hand 
manipulation, a robotic system must exhibit a number of features inherent to its mechanical 
design as well as to its sensory equipment, control methods and computational architecture. In 
this section, some of these requirements are examined, trying to make the suggestions coming 
from the human model explicit. 

-Number of fingers. The minimum number of frictional contacts necessary to firmly 
grasp a generic object is three [Salisbury, 82]; therefore, to obtain general enough grasp 
capabilities, three fingers are the least possible number. With three properly designed and 
controlled fingers is also possible to move the grasped object in all directions and orientations. 
A fourth finger is useful (but not strictly necessary) in some cases, e.g. when an exploration 
of the surface of the object being grasped is required. A fourth or even a fifth finger are useful 
to augment the strength of the grasp in heavy tasks. 

-Number of DOF’s. The minimum number of independently actuated joints in the 
fingers to obtain full mobility of the grasped object is three, if slip motions between the finger 
pads and the object surface are allowed [Salisbury, 82]. On the other hand, if the capability of 
rolling the fingertips relative to the object is desired, then at least three parallel joints per finger 
are necessary. 

-Opposability of fingers and palm. Besides by increasing the internal mobility of the 
hand, manipulation dexterity can take advantage by proper configuration of the DOF’s in the 
kinematic chain. From this point of view, the position and the excursion of each joint can 
greatly affect the resultant manipulatability [Kerr, 86]: e.g., the ability of the ’’thumb” to rotate 
about an axis normal to the palm surface permits the opposability of the lateral surfaces of the 
’’index” fingers. 

-Shape of the phalanges. The smoothness of the surfaces of the hand links plays a key 
role in allowing controlled fine motions of an object, obtainable by rolling and/or slipping. As 
taught by biological models, an elliptic or circular phalange cross section often provides well 
shaped contact areas for bodies of any shape. Another important requirement is related to the 
smoothness of surface connections between adjacent links: a conical or cylindrical shaping of 
the whole finger allows in the possibility to easily move the contact point from one link to 
another during manipulation and to extend contact area to more than one link when operating 
with large, flat objects. 

-Material properties of the pads. Some characteristics of the finger surface are 
desirable for dextrous manipulation. High friction, low stiction, and rather compliant 
materials can greatly increase grasp stability, by extending the effective contact surface with 
smooth objects or by reducing edge effects when sharp bodies are manipulated. 

-Proprioceptive sensory equipment. Sensing the internal variables of the hand is 
necessary to realize effective low-level control loops of actuators. In particular, joint position 
sensors must have high resolution to allow fine control of finger motions. Joint torque 
sensors can be used to close a control loop around the disturbance source (mainly friction in 
mechanical transmission of power from actuators to the joint), thus achieving better control 
performance; however, these sensors are not necessary if good transmission means are 
adopted. 




169 



-Exteroceptive sensory equipment. The role of exteroceptive (i.e., relating to interac- 
tions with the environment) sensory information in dextrous manipulation has been widely 
recognized ever since a relevant literature appeared in this field. Indeed, their importance 
results intuitively when considering how the human hand can perform innumerable tasks 
effectively. Notwithstanding this, the analysis of functional requirements for the exteroceptive 
sensory equipment of a dextrous hand has not yet been carried out satisfactorily. The 
fundamental work of Harmon (see e.g. [Harmon, 82]) in the field of tactile sensing, for 
instance, consisted in reporting what a group of industrial and academic researchers felt to be 
the necessary features of tactile sensors. These opinions, though influential, derived from the 
assumption of biomorphic models for sensors more than from objective functional analysis. 
The development of analytical methods for dextrous manipulation and the appearance of 
innovative non-anthropomorphic contact sensors offered a new viewpoint for the statement of 
sensory requirements (for an introduction of these themes, see [Mason, 85]). In the following 
text we will briefly discuss some functional considerations on the sensory equipment of 
dextrous hands, with particular reference to whole hand manipulation. A subdivision of 
dextrous manipulation tasks in three main classes will be considered: micro-motion, grasp, 
and exploration of manipulated objects. 

a) Micro-motion. The accomplishment of fine motion of objects held by an articulated 
hand necessitates of three basic steps: i)determination of the kinematic relationship between 
object motions in cartesian space and motions of the contact points between the object and the 
hand phalanges and palm (i.e., identification of the grip transform, see [Mason,, 85]); 
ii)determination of the kinematic relation between motions of contact points and motions of 
hand joints (i.e., the hand Jacobian); iii)control of joint position along specified trajectories. 
For a precise determination of both the grip transform and the hand Jacobian, the accurate 
knowledge of contact points is mandatory. Hence, sensors in the hand must primarily provide 
information about the position of every zone of contact between the object and any element of 
the hand (finger phalanges and palm). 

b) Grasp. The grasp of an articulated hand on an object can be described by the number 
and position of finger-object contacts and by the wrenches exerted through the contacts. In 
order to synthesize a grasp, the hand controller has to determine i)where to put the finger 
phalanges and the palm with respect to the object surface, and ii)the intensity and direction of 
the wrench in each contact. The former is basically a planning problem, which can be 
approached on the base of an a priori knowledge of the object shape (see e.g. [Nguyen, 86]) 
or with the help of global sensors like vision. The choice of optimal contact wrenches can be 
carried out, in the assumption that grasp geometry and external load is exactly defined, by 
criteria as those proposed by [Kerr, 86] and [Bologni,88]; an adaptive method for choosing 
grasp forces in changing conditions has been proposed by [Bicchi,89]. Effective control of 
contact wrenches requires a sensor to feedback (besides contact positions) the 3 -component 
vector of contact force and the 3-component vector of contact torque, where contact force and 
torque mean the resultants of distributed pressures over the contact area. Slippage avoidance 
(or control) is also a major concern in object grasping: a sensor able to evaluate slippage 
danger at each contact, and to detect when slippage actually occurs, would be very useful for 
grasping operations. 

c) Exploration. By the use of active exploration of objects by an articulated hand, it is 
possible to obtain a very rich information about the object characteristics otherwise achievable 
with difficulty. As an example, one could manipulate the object to know its shape, the texture 
of its surface, its hardness, its thermal or even chemical properties, etc. Sensory equipment 
for achieving these information might consist of several transducers based on different 
principles. However, we will consider here only the features that the hand sensors must 
exhibit in order to allow the basic explorative movements which are prerequisite for most 
active perceptual tasks, i.e. to move a finger along the object surface while exerting a 




170 



controlled pressure on it. To do this, control algorithms can be developed (see [Bicchi,89]) 
which require information about contact position and measurement of contact forces and 
torques on the hand surface. 

It should be pointed out that so far we referred to contact points as if contacts occurred at 
single points on the object and hand surfaces. This assumption is not verified when rather 
compliant materials are employed to cover hand’s surfaces (or the object itself is compliant). 
In tins case, a small-area contact will occur most often; information about contact area shape, 
and possibly about very small object features contained in such area, could be required to the 
sensory equipment of the hand. In most cases, though, it could suffice to know 
approximately the position of the contact area on the hand surface, by knowing the position of 
one of its points. 



Robotic end effectors: an overview 

The idea of using all the parts of the hand to manipulate objects is the obvious result of the 
observation of the human example; thus, the tendency to reproduce this capability with 
artificial devices can be traced back to the earliest prosthetic hands, developed several tens of 
years ago. Of course, the lack of any sensory and control capability prevented such devices 
from achieving any autonomous dexterity. 

A review of the state of the art of robotic devices puts in evidence that, while some attempts 
have been made to design mechanical structures exploiting all their elements for some 
manipulation tasks, their application has been hindered again by the unsuitability of sensory 
equipment and by practical limitations of control algorithms. In most cases, manipulation 
control methods have been defined (and sometimes implemented) for multi fingered hands 
operating with their fingertips only: significant contributions to the analysis of multifingered 
hand capabilities are in [Yoshikawa 85], [Kerr, 86] and [Li, 88]; the recent work by [Li, 89] 
provides an elegant formalization of manipulation modes, where rolling, slipping and finger 
relocation are considered. 

Robotic end-effectors with a palmar surface acting in opposition to the fingertips have been 
proposed by [Skinner, 75] and [Rovetta,77], while an adaptable grasping device with many 
contact surfaces distributed all along the kinematic chain of each finger (resulting in an 
articulated tentacle) was designed by [Hirose,78]. A multifingered gripper capable of 
adaptable grasping with many contacts enveloping an object was proposed by [Vassura,80], 
An articulated, three fingered hand proposed by [Okada,79] fulfilled some of the structural 
requirements to perform WHM: even if a palmar surface was not present, and only joint 
position sensors were employed, the finger design was suitable for locating contacts on lateral 
surfaces of distal and intermediate phalanges and some exhibitions in fine motion of objects 
were performed. The Utah-MIT Hand design [Jacobsen, 84], being basically anthropomorhic, 
is in principle suitable for whole hand manipulation tasks, even though the sensorization and 
control of the device have not yet been perfected. The Stanford-JPL Hand of [Salisbury, 82] 
rather emphasizes fingertip manipulation, being however at present capable of very fine 
manipulations by exploiting sensory feedback from built-in force-torque sensors located on 
the fingertips. Other known dexterous hands projects, as the Karlsruhe [Doll, 88] and the 
MITI [Kaneko,88] hands, seem mainly oriented to fingertip manipulation. 

The version II Belgrade hand [B&L,89] is able to grasp objects using its phalanges and 
palm; the variable configuration gripper designed by [Ulrich, 88] emphasizes the role of the 
palm in order to enhance its grasping capability. No provision is made in these projects for 
dextrous manipulation control through sensory feedback. 

An interesting implementation of a prototype hand has been presented by [Oomichi,88]: 
tactile and force sensors are integrated also in intermediate phalanges and the palm is exploited 
in power grasps. 

As a conclusive remark, it can be observed that, even if a widespread opinion holds that 
most important research topics in the area of dexterous manipulation are related to control 




171 



algorithms and task planning, much is left to do also in hand design, since a satisfactory 
integration between mechanical structure and distributed sensory equipment enabling the 
achievement of dextrous, whole hand manipulations is still far from being completed. 

The U.B. Hand 

The first version of the U.B. Hand [Belletti,86] [Bologni,88] started working on a test frame 
in March 1988 and has been operative on a IBM 7565 gantry robot since the beginning of 
1989; most of the experimental work carried out so far has consisted of examining the 
reliability of the proposed device and evaluating its effectiveness in dexterous manipulation 
tasks, with particular reference to grasping. 

In order to provide a quantitative measurement of the hand grasping ability, to enable a 
comparison of different hand designs, and hence to guide in the choice of possible solutions, 
the need has been felt to overcome the limitations of previous grasp classification methods, 
which where found to be qualitative and insufficiently detailed. A method for the classification 
of all the achievable grasps has been proposed by [Bologni,88]. 

The proposed method is based on the 
generation of a table of the feasible contact 
configurations, where all possible opposi- 
tions of the hand elements (proximal, in- 
termediate, and distal phalanges of the two 
"index” fingers with each other or with the 
proximal and distal phalanges of the "thumb” 
and the palm) are enumerated (see Fig.4). 

The version I U.B. Hand has been 
evaluated according to this method, showing 
that its kinematic configuration was suitable 
for whole hand manipulation. Experimental 
tests confirmed the hand effectiveness, espe- 
cially in grasping: some whole hand grasps, 
mimicking the above presented human hand 
examples, are shown in Fig. 5. In order to 
further improve mechanical effectiveness and 
reliability, and to exploit the propensity of 
version I kinematic architecture to whole- 
hand manipulation, a second version of the 




FINSER(S) 
S, THUMB 
vs PALM 



-Q 


_a_ 


_a_ 






TT 


IfiJ 


□l 


o 




o 


0 




0 


0 




0 


o 












TT 




6 


o 


3 


m 


— 


3 






IHI 




CL J 



FINSER(S) 
vs PALM 
!> THUMB 



Fig.4 The table of opposition modes 




Fig. 5 The U.B.Hand 



172 



hand is being developed. 

The fingers of the version II UB Hand are designed according to a biomorphic skeleton- 
and-flesh model: in each phalange, an external shell, covered by a compliant high friction 
layer, and capable of sensing contacts, is connected to an inner rigid element of the kinematic 
chain. 

The skeleton structure is composed of CNC machined links, connected through ball 
bearing revolute pairs. The design emphasizes modularity and tends to permanent assembly 
solutions in order to increase reliability and reduce the number of parts. The actuation of the 
1 1 joints of the fingers is obtained through tendons and pulleys. The adopted configuration 
permits an easy removal of the external shell, so as to allow thorough accessibility and easy 
intervention on tendons. A detailed sketch of one of the ’’index” fingers of the hand is shown 
in Fig.6. 



proximal 
phalange sensor 



Intermediate 
phalange sensor 






VJo 














^ phalange /Pk 


- wt 




CROSS SECTION $ F 


1 



^ ftngertip sensor 



phalange structure 



^ sensor beam 



Fig.6 Integration of sensors and mechanical structure 

The shape of the external shell of finger phalanges has been chosen so as to provide a 
regular surface for contacts all around the finger axis. The intermediate phalanges are covered 
with a cylindrical surface with elliptic cross-section, while the fingertip shells are revolution 
ellipsoids with the longitudinal axis inclined 20 degrees in the upward direction. A flat surface 
in the upper region enhances the approach capability, as shown in Fig. 7. Finally, the palm 
surface has been designed as a portion of the convex surface of a large radius sphere. 





Fig. 7 Fingertip approach capability 

The general view of the version II UB Hand can be seen in Fig. 8: the modular design will 
allow the synthesis of different configurations, e.g. by varying the relative position of fingers 
with respect to the palm. The adduction-abduction movements of the upper fingers are 
independent, so that syncronous lateral movements of both fingers are allowed. 





173 



a - 0-15 
0 - 0-90 
y * 0-90 
5 - 0-90 
4 > - + ■ 90 
n - 0-90 
H - 0-90 
jt - 20 




A= 32 mm 
B= 45 
C= 35 
D= 30 
E= 38 
F= 45 
G= 35 
H= 35 
I =23 
L = 22 



Fig. 8 Kinematic architecture of the U.B. Hand 



The sensory equipment of the hand consists basically of joint position sensors 
(conventional shaft encoders on motor axes) and contact sensors, realized by means of the 
Intrinsic Tactile sensing method [Bicchi,87]. In fact, this approach seems to satisfy most of 
the functional specifications above examined, while its implementation does not require too 
complex hardware and software means. 

An IT sensor is very simple in its constitutive parts, which are a 6-axis, force/torque sensor 
and a cover shell whose surface (the phalanx or palm surface) has known geometry. 

According to the results of [Bicchi,89], if the IT sensor shell contacts an object with a 
small area- type contact, and adhesive forces are not exerted through the contact, by 
elaboration of the force/torque and geometrical information it is possible to know: 

-the position of the contact centroid, that is a point on the shell surface which is assured to 
be internal to the contact area; 

-the resultant contact force applied at the contact centroid, in intensity and direction; 

-the resultant contact torque, in intensity and direction. 

By comparing this with the information needed for a manipulation- oriented sensory 
system, it is revealed that most conditions for W.H.M. are fulfilled by IT sensing; the 
exception is the capability of fine imaging of features inside the contact area. We decided to 
postpone the realization of such fine imaging, since it would have required very high 
resolution skin-like tactile sensors which, if at all available at present, would represent a 
computational bottleneck for the whole system. 

In order to accomplish the required functional capabilities of the sensory system for whole 
hand manipulation, IT sensing had to be realized in each phalanx and in the palm, making a 
total of 9 sensors. 

A crucial problem in IT sensing implementation is the miniaturization of the 9 force/torque 
sensors employed in the hand. Different design schemes have been adopted in order to fit 
them in different parts of the hand, namely the fingertips, the intermediate phalanges and the 
palm. All sensors however employ semiconductor strain-gauges applied to deformable 
aluminum structures; common to the design of all the force/torque sensors is also the 
optimization approach employed to maximize sensor accuracy notwithstanding the small size 
of the sensors. This approach uses a modellization of force/torque sensors in terms of linear 
operations on the vector of strain measurements V obtained from strain-gauges: 

V = CP (1) 

where C is the compliance matrix of the mechanical structure of the sensor, relating the load 



174 



vector P to the measurements V . The load vector P is composed of the unknown six 
components of the force and torque acting on the sensor in a specified reference frame; the 
components of P are normalized with respect to the nominal value of each component, so that 
the norm of P* ||P||, is always less than or equal to 1. 

Such modellization of the force/torque sensor leads to some considerations about sensor 
design: the first is that, if a 6 components load vector P is to be measured, then obviously 
only 6 measurements are strictly necessary. In the design of a force/torque sensor with 
stringent size limitations, keeping in mind this fact, though trivial, may be useful. 

Numerical stability analysis techniques may be applied to the linear model of the sensor in 
order to evaluate its accuracy. The causes of errors in a multicomponent sensor can be in fact 
divided into three main groups: 

i) errors in strain measurements, caused by instrumentation inaccuracies, noise etc. These 
errors reflect in a term dV which is summed to the measured strain vector V. 

ii) errors in the compliance matrix coefficients, due to the lack of exact knowledge of the load- 
strain relationship for the sensor structure. The C matrix can be in fact evaluated both 
numerically (e.g. with beam theory or with finite elements methods) and directly, by 
calibrating the sensor with known loads; anyway, an error matrix dC will result from 
modeling inaccuracies or from experimental errors. 

iii) possible amplification of the errors above can occur while solving the linear system (2): 

V +dV «(C+dC) (P+dP) (2) 

Equation 2 represents the true load-measurement relationship idealized in (1); dP is the 
error resulting on the ultimate information of the force sensor, the load vector P. 

In case a minimal sensor design is adopted, i.e. as many strain gauges are used as the load 
components are, the generalized form of Wilkinson’s formula for error propagation can be 
applied to give an a priori estimate of the relative error on P: 

e p = (e v + e c ) Kp(C) (3) 

where ey =l|dV||/||V||, ec =||dC||/||C||, and e p =||dP||/||P||, are respectively the relative errors on 
strain measurements, on calibration and on the results. The propagation factor Kp(C) has an 
upper bound that is close to the condition number of the compliance matrix C: 

K p * N(G) = |C| |C '| * 1 



If more strain-gauges are employed in the sensor than are strictly required, a slightly more 
complex propagation formula can be obtained (see [Bicchi,89]). 

From this analysis of the causes of errors in force/torque sensors, it follows that possible 
means to increase accuracy are substantially two: a)to reduce the source errors i) and ii), by 
basically employing more sophisticated technologies in strain measurement and calibration, 
and b)to reduce the amplification of source errors by minimizing the condition number of the 
compliance matrix. While further source error suppression will conflict at some point with 
given technological or economic limitations, error propagation can be limited by carefully 
designing the sensor. Hence, in designing the various force/torque sensors employed in our 
articulated hand, we used an optimization method whose merit criterion was the minimization 
of the condition number of the sensor compliance matrix. 

The structure of the sensors realized inside the fingertips, the intermediate phalanges and 
the palm are shown respectively in Fig. 9 a,b,c. 

The structure of fingertip sensors simply consists of a thin walled cylinder, on which strain 




175 



gauge s are applied at optimal locations and orientations. The sensor arrangement has some 
attractive features, which have been discussed in [Bicchi,87]. 

In the intermediate phalanges sensors, one end of a rectangular cross-section, internally 
drilled beam is fixed to the phalanx shell, the opposite end being fixed to the finger frame (the 
skeleton, so to speak). Gauges are bonded on the beam surface; the length of section 
sides, the radius of the internal hole, theposition and orientation of the gauges have been 
chosen following the above described optimal design procedure. 




Fig. 9 Sensor configuration for fingertip, phalange and palm 



Finally, the palm sensor sketched in Fig. 9c consists of three thin flexures, placed behind 
the palm surface, on which strain-gauge s are placed. The flexures are inclinated and are 
spaced 120 degrees apart. Again, the inclination angle, and the location of strain-gauge s on 
the flexures are chosen to optimize sensor accuracy. 

Conclusions 

The paper has reported on the design issues of both mechanical and sensory equipment of a 
robotic articulated hand, by which tasks will be accomplished with full exploitation of the 
available links. Following from an analysis of some dextrous manipulation operations 
performed by the human hand, some design requirements have been derived: it should be 
possible to touch manipulated objects with every part of the hand surfaces and it should be 
possible to detect the position of each contact point and to measure the forces and torques 
exerted by contact. 

The basic idea of integrated and distributed Intrinsic Tactile sensoriality in a purposely 
designed mechanical configuration, the version II U.B.Hand, has been presented, and its 
practical feasibility illustrated. The mechanical and sensory equipment design are the first step 
towards the implementation of a complete system for dextrous manipulation, of which some 
elements are still being developed. However, a prototype finger designed according to the 
proposed principles has been realized, and preliminarly tested. 

Despite the broad potential of the device, deriving from the 1 1 D.O.F.’s and thorough 
sensorization, the resulting design appears to be reasonably compact and feasible, both for its 
mechanical structure, due to simplification in machining and assembly process, and for 
sensory equipment and data integration, due to the peculiarities of intrinsic tactile sensing. 

Future work will be mainly focused on the design and implementation of suitable com- 
putational architecture and sensory control procedures. 



Acknowledgements 

This work reports on the activity of the interdisciplinary research group from DIEM and DEIS 
of the University of Bologna, composed of G.Vassura and C.Bonivento, as coordinators, and 




176 



L.Bologni, S.Caselli, E.Faldella, C.Melchiorri, V.Parenti Castelli and A.Tonielli. The 
version 2 UB Hand has been designed in cooperation with the Centro ”E.Piaggio” of the 
University of Pisa, where the sensory equipment has been developed. 

The authors wish to thank Stefano Monti for his activity in setting up the version II finger 
prototype. 

Financial support was provided by IBM Italia and CNR Progetto Finalizzato Robotica 
grant N. 89.00561. PF.67.3. 



References 

B&L Engineering ’’The Belgrade/USC Robot Hand”, Brochure, Preliminary Release, May 
1989 

M. Belletti, C. Bonivento, G. Vassura, ’’Sviluppo di Organo di Presa ad Elevata Destrezza per 
Robot Industriale”, Convegno naz. SIRI, Milano, 1986. 

A. Bicchi, P. Dario: ’’Intrinsic Tactile Sensing for Artificial Hands”, 

Robotics Research, R.Bolles and B.Roth Editors, MIT Press, 1987. 

A. Bicchi, J. K. Salisbury, P. Dario: ’’Augmentation of Grasp Robustness Using Intrinsic 
Tactile Sensing”, Proc. 1989 IEEE Conf. on Robotics and Automation, Scottsdale, AZ, May 
14-19, 1989. 

A. Bicchi: ’’Strumenti e metodi per il controllo di mani per robot” (Methods and Devices for 
the Control of Robot Hands), Doctoral Dissertation, University of Bologna, 1989. 

L. Bologni, ’’Robotic Grasping: How to Determine Contact Positions”, Proc. IFAC Symp. on 
Robot Control (SYROCO), 1988. 

L. Bologni, S. Caselli, C. Melchiorri,” Design issues of the U.B. Hand” NATO ARW on 
Robots with Redundancy, Salo’, Italy, 1988. 

C. Bonivento, S. Caselli, E. Faldella, C. Melchiorri, A. Tonielli, ’’Control System Design of 
a Dextrous hand for Industrial Robots”, IFAC SYROCO ’88, Karlsruhe, 1988. 

T. J. Doll, H. J. Scneebeli,” The Karlsruhe Hand”, IFAC SYROCO ’88, Karlsruhe, 1988. 

L. D. Harmon: ’’Automated Tactile Sensing”; IntJour. Robotics Research vol. 1, No.2, 
pp.3-32, 1982. 

S. Hirose, Y. Umetani,”The Development of Soft Gripper for the Versatile Robot Hand”, 
Proc. 7th ISIR, Tokyo 1977 

A. D. Keller, C. C. Taylor, V. Zahm, ’’Studies to Determine the Functional Requirements for 
Hand and Arm Prosthesis”, Dept, of Eng. UCLA, 1947 

S. C. Jacobsen et al, ’The UTAH-MIT Dextrous Hand: Work in Progress”, Int. Jour, of 
Robotics Research, vol. 3, n.4, 1984. 

M. Kaneko, K. Tanie, ’’Basic Considerations on the development of a Multi-fingered Robot 
Hand with the Capability of Compliance Control”, Proc. Int. Meeting on ADVANCES IN 
ROBOT KINEMATCS, Ljubljana, September 1988 



J. Kerr, B. Roth: ’’Analysis of Multifingered Hands”, Int.Jour. Robotics Research, vol.4, 
n.4, MIT Press, 1986. 




177 



M. T.Mason, J.K. Salisbury: "Robot Hands and the Mechanics of Manipulation”, MIT Press, 
Cambridge, MA, 1985. 

V.D. Nguyen: "The Synthesis of Stable Force-Closure Grasps”, MIT-AI LAb Technical 
Report 905, 1986. 

T. Okada, "Object Handling System for Manual Industry”, IEEE Trans.on Systems, Man and 
Cybernetics, Vol SMC 9, pp.79-89 , 1979 

T. Oomichi, T. Miyatake, A. Maekawa, T. Hayashi, "Mechanics and multiple sensory 
bilateral control of a fingered manipulator” Int. Conf. on Robotics Research, 1988 

J. K. Salisbury, "Kinematic and Force Analysis of Articulated Hands”, Ph. D. Dissertation, 
Stanford University, 1982. 

J. K. Salisbury, J.J. Craig, "Articulated Hands: Force Control and Kinematic Issues”, Int. 
Jour, of Robotics Research, vol. 1, n.l, 1982. 

J. K. Salisbury: "Interpretation of Contact Geometries from Force Measurements”; Robotics 
Research, M.Brady and R.Paul Editors, MIT Press, 1984. 

J. K. Salisbury, " Whole-Arm Manipulation”, Proc. 4th Int. Symp. on Roboics Research, 
Santa Cruz, CA, August 87 

G. Schlesinger,”Der Mecanische Aufbau der Kunstlichen Glieder”, Ersatuglider und 
Arbeitshilfen, M. Borchart et Al., Springer, 1919 

Z. Li, S. Sastry, "Dextrous Robot Hands: Several Important Issues”, Proc. IEEE Workshop 
on Dextrous Robot Hands, Philadelphia, 1988. 

Z. Li, J.F. Canny, S. Sastry ”On Motion Planning for Dexterous Manipulation, Part I: The 
Problem Formulation”, Proc. IEEE Int. Conf. on Robotics and Automation, Phoenix, 1989 

A. Rovetta, ”On Specific Problems of Design of Multi-Purpose Mechanical Hand Industrial 
Robots”, Proc. 7th ISIR, Tokyo, 1977. 

F. Skinner, "Multiple Prehension Hands for Assembly Robots”, Proc. 5th ISIR, Chicago 
1975 

R. Tomovic, G.A.Bekey, W.J.Karlplus, ”A Strategy for Grasp Synthesis with Multifingered 
Robot Hands”, Proc. IEEE Int. Conf. on Robotics and Automation, Raleigh, 1987. 

R. Tubiana, "The Architecture and Functions of the Hand”, The Hand, Vol.l, W.DB. 
Saunders & Co., pp. 19-93, 1981 

N. Ulrich, R. Paul, R. Bajcsy, ”A Medium-Complexity Compliant End Effector”, Proc. 
IEEE Int. Conf. on Robotics and Automation, Philadelphia, 1988. 

G. Vassura, A. Nerozzi, ”Study and Experimentation of a Multifinger Gripper”, Proc. 10th 
ISIR, Milano 1980 

T. Yoshikawa,” Manipulatability of Robotic Mechanisms”, Int. Jour, of Robotics Research, 
Vol.4, n.2, Summer 85 

T. Yoshikawa, K. Nagai, "Analysis of Grasping and Manipulation by Multi-Fingered Robot 
Hands”, Proc. IEEE Workshop on Dextrous Robot Hands, Philadelphia, 1988. 




Stable Grasping and Manipulation by a Multifinger 
Hand with the Capability of Compliance Control 



Kazuo Tanie and Makoto Kaneko 

Cybernetics Division, Robotics Department 
Mechanical Engineering Laboratory, Mm 
Tsukuba City, Ibaraki, 305 Japan 



Abstract 

This paper deals with two fundamental problems concerning a multi-fingered robotic hand with 
the capability of compliance control. One of them relates to developing a torque sensor useful 
for the tendon-pulley driving system and the other was a stable grasping and manipulating 
problem. In order to construct the finger joint actuation in multifingered systems, a tendon- 
pulley driving system has normally been used. In this kind of driving system a compact joint 
torque sensor plays an important role in achieving compliant motions at the finger tip and in 
turn constructing dextrous multifingered hands. It seems, however, that a satisfactory torque 
sensor has not yet been developed for such a driving system. To cope with this, a Tension 
Differential type Torque sensor (TDT sensor) is first proposed in this paper and applied to a 
newly designed robotic hand with two articulated fingers in experiments. Secondly, the stable 
grasping and manipulation problems in the multifingered hand are addressed assuming the 
existence of friction at the contact area of each finger tip and the object grasped. To formulate a 
stable grasping condition, the stiffness matrix of an object grasped by fingers with compliance 
adjustable joints was introduced. By using the stiffness components, the condition was 
described in a simple form. The stiffness matrix was resolved to the simpler form at the tip of 
each finger when the hand was grasping an object. To satisfy the desired stiffness matrix at the 
tip, the joint stiffness matrix at each finger was adjusted. To manipulate an object, the desired 
trajectories of the object were converted to joint trajectories using the inverse kinematics 
equations, and the position reference at each joint servomechanism was adjusted according to 
them, keeping the stable grasping condition. A robotic hand with two articulated fingers 
equipped with specially designed small TDT sensors was constructed. Using the hand, various 
experiments were carried out and the proposed methods were confirmed. 



1. Introduction 

There are two topics described in this paper. Firstly, a newly designed tension difference type 
torque sensor suitable for tendon-pulley driving system is proposed. Secondly, Stable grasping 
and manipulation problems are discussed based on the compliance control method. 

In order to grasp various shapes and to manipulate objects dexterously, numerous multi- 
fingered robot hands have been designed [3, 5, 7, 9]. Because of the limited space in which to 
install actuators inside the fingers, actuators are usually placed on a location far from the driving 




180 



joints and the power is transmitted from an actuator to a finger joint through appropriate trans- 
mission devices. A tendon-pulley system is a popular device for this purpose. In general, the 
transmission system includes elements with low structural stiffness and friction such as 
tendons, gears, and conduits. This makes it difficult to realize a high quality compliance control 
of such a robot hand without any force (or torque) feedback loop. 

In active compliance control for robot hands, force(or torque) sensing has been introduced 
either at the finger tip [10], or at the joints [3,4,9]. The force sensing at the finger tip requires 
the inverse kinematic calculation to determine the force at the joint space and results in low 
control speed. Alternatively, the joint torque sensing is effective to remove this drawback, 
because no coordinate transformation is necessary to provide the force feedback with each joint. 
Accordingly, joint torque sensing is probably appropriate for a high speed joint torque control. 

The typical tendon-pulley driving system has agonist and antagonist tendons for a single axis 
joint. To get the single axis joint torque, the tension difference between both tendons must be 
measured. The popular technique is using two tendon-tension sensors installed on each tendon. 
The difference of the two sensor outputs are calculated to obtain torque information. Salisbury 
and Craig[9] have developed the cantilever type tendon-tension sensor having idler pulley and 
strain gauges. Jacobsen and others[3], Dario and Buttazzo[4] have used similar methods to 
obtain tendon-tensions for a four-fingered robot hand and a single finger system, respectively. 
Although the torque around the drive pulley can be successfully measured by taking the 
difference between two tendon-tensions, this approach has two disadvantages: (i) Two tendon- 
tension sensors are necessary for a joint in principle, while the tension itself is not the main 
point of interest to construct the control system. This might prevent us from constructing a hand 
system with few signal lines and simple configuration, (ii) The bias tendon-tension always acts 
on the base of the sensor beam. This will make it difficult to get a wide dynamic range and 
remove drift due to residual stress. 

To cope with this, the authors propose the Tension Differential type Torque sensor (TDT 
sensor ), which is based on the idea that the torque around a drive pulley is proportional to the 
tension difference, and this can be measured directly without sensing individual tendon-tension. 
The working principle of this sensor will be demonstrated with some experimental results. 

Regarding the stable grasping problem, in this paper a stable grasping condition described 
using compliance parameters of the grasping system is discussed. In order to have a practical 
discussion, the friction effects between a finger tip and the object grasped will be considered. 
There are many works concerning multifingered hands [1,2,6]. However, few researchers deal 
with the friction effects in grasping motion. Therefore, there are some difficulties for the 
analytical results to apply to a practical grasping system. In the analytical discussion, a grasping 
model including the friction effects is introduced. To find the grasping stability condition, the 
stiffness matrix of an object supported by compliant fingers is first constructed. Through the 
potential energy analysis, it will be found that grasping stability will be assured if the matrix is 
positive definite. To provide variable compliance functions, task coordinate position servos will 
be constructed at the tip of each finger. The loop gain of each servo system will be changed to 
put appropriate compliance to each axis on the task coordinate system. The manipulation is 
carried out by adjusting the reference value to each position servo system. Analytical results 
will be confirmed from the experiments using a robotic hand with two newly developed fingers. 




181 



2. Tension Differential Type Torque Sensor (TDT Sensor) 

The basic principle of the TDT sensor is illustrated in Fig. 1(a). Let Fi and F2 be the tendon- 
tensions at both sides of pulley A, respectively. Then the torque T around the drive pulley is 
given by 



T = r(Fi-F 2 ) (1) 

where r is the radius of the pulley. Note that the torque can be obtained by directly measuring 
the tendon-tension difference instead of measuring two tendon-tensions individually. The TDT 
sensor is based on this idea. 



Vcc T 





Fig. 1 Two types of torque sensors: (a) tension differential type torque 
sensor (b) conventional type torque sensor 

This sensor is composed of two idler pulley parts and a cantilever beam part incorporating two 
strain gauges. When a torque is applied to pulley A, the y-directional force at the tension 
pulleys is expressed by 



Fyi = 2Fisin a (i = 1, 2) 



( 2 ) 



182 



Then, the moment acting on the beam with strain gauges is given by 

M = (F1-F2)/; sin a (3) 

By substituting a =/ j sin ot/r into the above equations, we can obtain the following expression 

M = aT (4) 

Eq. (4) means that the strain gauge output is proportional to the torque applied around the drive 
pulley. 

Figure 1(b) shows a popular torque sensor for tendon-pulley systems, in which two 
individual sensors detect tensions of the agonist and the antagonist tendons and the difference 
of those tensions is calculated in a differential amplifier to get the torque output. Comparing it 
with the proposed sensor, the following advantages of the TDT sensor can be identified: 

(1) The number of strain gauges is reduced by half. 

(2) The sensor consists of a single element and it is possible to make it smaller. 

(3) The sensor beam receives a moment only when a torque is added at the drive pulley, while 
tendon-tension always acts on the each sensing element in Fig. 1(b). This is effective to avoid 
residual stress of the sensor which may make unfavorable nonlinear characteristics. 

Figure 2 shows a general view of the proposed TDT sensor. The static characteristics of this 
sensor are shown in Fig. 3, where the white circle plots are the results for the sensor with the 
pulley B as shown in Fig. 1. The black plots are obtained by removing the pulley B. The 
linearities between applied torque and sensor output are fairly good for both experiments. Even 
without pulley B, the sensor can detect the torque without a remarkable reduction of the 
sensitivity. Therefore, it will be possible to remove the pulley B to simplify the structure. 




Fig. 2 A prototype TDT sensor 





183 



Torque sensor 
output V 

m to 

• • 

o o 


T 

// 

/' 

Z' i i 


-20 -10 J 

a/ 


3 10 20 


Applied torque 




N cm 


o 

V 

• o 

\ 

•\ 


o : With 


/ / 


pulley B 


/ / 

/ -2.0 

_z 


• : Without 


- pulley B 



Fig. 3 Static characteristics of a TDT sensor 

3. Stiffness Based Stable Grasp 
3.1 Theoretical Approach 

Figure 4 shows an object grasped by two fingers without friction at each finger tip. Suppose 
that the compliance at each finger tip can be controlled and that an object is supported by two 
fingers through the compliance. In the figure each finger is shown as a spring because of 
emphasizing the compliance effects. It is assumed that the influence of gravity is ignored for 
simplicity. In the non-friction model as shown in Fig. 4(a), finger tip forces can be applied 
only in directions normal to the surfaces. When an external force is applied to the object, 
making it rotate in the counterclockwise direction precisely, it is found from the geometrical 
structure that a restoring moment will be generated. Therefore, the grasp becomes stable for 
such a small rotation. Figure 4(b) is a more practical model in which friction exists at the finger 
tips. In this case forces applied at the finger tip are not normal to the object surface. When 
tangential forces less than friction forces are applied on the object surface, the finger tips will 
move together with the grasped object according to the applied external forces. Figure 4(b) 
shows the behavior of the grasping system that the counterclockwise external torque is applied 
to. From the observation of the behavior, the reaction forces/torques work to encourage the 
external torque and result in unstable grasping. As revealed by this simple example, even a 
grasping system which is stable in the absence of friction may become unstable if friction 
effects are introduced. This paper proposes an approach to make a grasping system with stable 
friction effects. 

An n-fingered two-dimensional grasping model as shown in Fig. 5 is used for analytical 
discussion. Each finger is expressed by two virtual springs (k x i and k y i) at each finger(i) tip 




184 

— * Finger tip force in equilibrium state 
— ■“Finger tip force in a small rotation B 




185 



because it is assumed that the compliance at each finger tip can be adjusted. A finger tip i has a 
coordinate system (oi-xiyi) where the origin oj is at the contact point of the finger tip and an 
object. The k x i and k y i are stiffness parameters along xi and yi axes, respectively. The 
directions of the coordinate axes xi, yj will be determined from the compliance center of the 
object which can be put arbitrarily inside the object. The axis yj is selected along a line which 
goes through both finger tips, and the compliance center and xj is perpendicular to y{. These 
precautions are worthwhile in a discussion of the grasp stability in a simple way. 

Suppose that the finger tip i will be moved along the yi axis to hold an object. When the 
virtual position reference along the yi axis exists in the object, the virtual spring along the yi 
axis will be deformed and will generate an internal force which corresponds to the grasping 
force of the object. On the other hand, the virtual spring along the x{ axis keeps natural length, 
if no external forces are applied and, therefore, generates no applied force to the object. 
However, the virtual spring along the xi axis plays an important role in keeping a stable grasp 
under a small rotation of the object, because the virtual spring along the xi axis always 
generates a restoring moment against a rotation of the object. These discussions are formulated 
in detail in the following. 

The stiffness matrix K of the grasped object in Fig. 5 can be expressed in the following form, 





k X x 


k-xy 


PT 

X 

a> 

i 




K= 


kyx 


k yy 


kye 


(5) 




ke x 


k e y 


k e 





Now, for simplicity, assume a decoupling condition between rotational motion and 
translational motion. This condition is given by, 

k x e = k x e = 0 (6) 

kyO = k y 0 = 0 (7) 

By exploring the positive definiteness of the stiffness matrix K, the following stability 
condition is obtained: 



k0>O (8) 

This condition can be also expressed by using virtual spring constants, 

X (kxi r i^ " k y i 8 o iri)>0 (9) 

Where r*, and 5 0 i are the distance between the compliance center and finger tip, and the 
initial compression of the virtual spring along the yi axis, respectively. From Eq.(9), it can 
also be understood that the virtual spring along the xi axis plays an important role for stable 




186 



grasping. For example, suppose that in the two-fingered model of Fig.4(b), there are 
relations, k x i=k x 2=0. In this case, ke= -Xk y i8 o irj<0 can be obtained from Eq.(9). Therefore, 
it is easily found that the grasping system becomes unstable. From this fact, it is confirmed 
that the existence of the virtual spring parameter along the x* axis is important for stable 
grasping. 

3.2 Experimental Verification 

The derived stability condition will be verified experimentally by using the developed two- 
fingered robot hand with the capability of compliance control. The developed robot hand has 
three joints for each finger. In this research, however, the base joint is mechanically locked 
and only two joints for each finger are used. The TDT sensor specially designed for this 
robot finger is installed at a location close to each joint to measure the joint torque. Using this 
torque sensor, a torque feedback system is constructed at each joint. In order to provide a 
desired stiffness with the finger tip, each joint torque is controlled by the Active-Stiffness- 
Control method proposed by Salisbury [8]. When the desired stiffness matrix at the tip of 
finger i is Kj, the joint stiffness matrix Kqi is adjusted according to the following relation, 

Kqj=Jj T KiJi (10) 

where Jj is Jacobian matrix for i-th finger. T shows the transpose of the matrix. Using the 
above equation, joint torque control law can be written as the following equation, 

Ti = K qi (q ri -qO (11) 

where Ti, q r i and qi are the joint torque vector, the reference input vector of joint angle, and 
the current joint angle vector, respectively, for the finger i. To manipulate the object, the 
desired trajectory of the object is transformed to a trajectory in the joint space of each finger i 
using the proper inverse kinematic equations. This joint trajectory is used for the reference 
input vector qn. Figure 6 shows the block diagram of the control system. 

In the experiments, at the beginning two fingers grasp the object with a small offset angle 
against the horizontal direction. After the grasp parameters are set to the horizontal direction, 
the motion of the object is observed using the trajectories of LEDs installed at the finger tip. 
Figure 7 shows two typical experimental results, where Fig. 7(a) corresponds to stable 
condition (ke=376 Nmm/rad), and Fig.7(b) unstable condition (ke=-114 Nmm/rad). The 
photographs have been made in a dark room keeping the camera shutter open, and the light is 
flashed at both initial and final states. In spite of a relatively large offset angle, Fig. 7(a) 
shows the way in which the object is recovering from an offset state to an equilibrium state. 
On the other hand. Fig. 7(b) shows the way in which the grasp is collapsing for a small offset 
angle. 




187 



•Trajectory ; ir° w (t), <*(t) 

• Stable Condition ; Fo , k w e, k w x , k^ 

4 

(1) Cal. of Virtual Springs ; kxj, kyi, S\ 

(2) Cal. of Joint Control Parameters ; (Kq)j , qi r i 



(FINGER 1) 




Fig. 6 Control system 



188 




i 



^ I Direction of rotation 
Fig. 7 Grasping experiments: (a) stable condition (ke=376 Nmm/rad) 






189 





Fig. 7 Grasping experiments: (b) unstable condition (ke = -1 14 NmnVrad) 






190 



4. Conclusions 

The results of this paper are summarized as the following. 

(1) A Tension Differential type Torque sensor (TDT sensor) has been proposed suggested by 
the idea that the torque around a drive pulley is proportional to the tension difference, which 
can be measured directly without sensing individual tendon-tensions. 

(2) A stiffness model based stable grasp has been proposed, and using stiffness parameters, 
it was found that a stable grasping condition could be described in a simple form. 

(3) A method of manipulating an object in keeping with the stable grasping condition has 
been proposed. 

(4) A robotic hand system with two articulated fingers and the capability of compliance 
control has been developed, and using the system analytical results have been confirmed. 



References 

1. Baker, B. S., Fortune, S. J., Grosse, E. H.: Stable Prehension with a Multi-Fingered 
Hand, Proc. IEEE Int. Conf. on Robotics and Automation, St. Louis, 570-575, 1985. 

2. Hanafusa, H., Asada, H.: Stable Prehension By a Robot Hand with Elastic Fingers, Proc. 
of 7th Int. Symp. on Industrial Robots, Tokyo, 361-368(1977). 

3. Jacobsen, S. C., Iversen, E. K., Knutti, R. T.: Design of the Utah/M.I.T. Dexterous 
Hand, Proc. IEEE Int. Conf. on Robotics and Automation, San Francisco, 1520- 
1532(1986). 

4. Dario, P., G. Buttazzo, G.: An Anthropomorphic Robot Finger for Investigating Artificial 
Tactile Perception, Int. Journal of Robotics Research, Vol. 6, No.3, 25-48(1987). 

5. Kobayashi, H.: Control and Geometrical Consideration for an Articulated Robot Hand, 
Int. Journal of Robotics Research, Vol. 4, No.l, 3-21(1985). 

6. Nguyen, V.: The Synthesis of Stable Grasps in the Plane, Proc. IEEE Int. Conf. on 
Robotics and Automation, San Francisco, 884-889(1986). 

7. Okada, T.: Object-Handling System for Manual Industry, IEEE Trans. Systems, Man, 
and Cybernetics, Vol. SMC-9, No. 2, 79-89(1979). 

8. Salisbury, J. K.: Active Stiffness Control of a Manipulator in Cartesian Coordinates, 
Proc. 19th IEEE Conf. on Decision and Control, 95-100(1980). 

9. Salisbury, J. K., Craig, J. J.: Articulated Hands: Force Control and Kinematic Issues, 
Int. Journal of Robotics Research, Vol. 1, No. 4, 4-17(1982). 

10. Tsusaka, Y., Sawasaka, N., Inoue, H.: Turning an Object between Reposing Postures, 
Proc. 5th Annual Meeting of the Robotics Society of Japan(in Japanese), Tukuba, 451- 
452(1987). 




Part 3 

Locomotion 





MOBILE ROBOTS THE LESSONS FROM NATURE 



D. J. Todd 

Department of Mechanical Engineering 
University of Edinburgh 
King’s Buildings 
Mayfield Road 
Edinburgh EH9 3JL 
Scotland 



1. ABSTRACT 

This paper records some of the ways in which biology has contributed to the technology of 
locomotion and discusses a 

number of features found in animals which have yet to be fully 

exploited. These subjects are illustrated with reference to the design of a legged vehicle being 
developed at Edinburgh. The paper goes on to list a number of additional design ideas drawn 
from nature which seem to have potential for future legged robots. 



^INTRODUCTION 

The connection between biology and robotics is at its most striking in the phenomenon of 
locomotion. Locomotion is the hall-mark of the animal world, and it is when we make mobile 
robots that we seem to be emulating biology most closely; here, if anywhere, we ought to be 
able to learn from nature. 

This has long been acknowledged and, especially in the case of locomotion with legs, many 
principles derived from the observation of legged animals have consciously or unconsciously 
been incorporated into the design of legged robots. 

This paper records some of the ways in which biology has contributed to the technology of 
locomotion and discusses a number of features found in animals which have yet to be fully 
exploited. 

It proceeds largely by discussing the ways in which biological observations have 
influenced the design of a particular robot, and by listing other animal mechanisms which 
have not yet been developed to any great extent, although tentative trials have been made of 
some of them. 

Since most of my work is on legged locomotion, my observations of nature have been 
concentrated on how animals walk and run, and these observations are relevant almost 
exclusively to legged vehicles. However, one cannot help noticing some behavioural aspects 
of animal life, which would seem to be relevant to all kinds of mobile robot, whatever their 
locomotion method. 

Particularly interesting is the way in which animal behaviour is usually appropriate: 
animals are much less likely than machines to walk off cliffs, get stuck in comers or load a 




194 



damaged limb to destruction. The final section of the paper suggests that some current 
research, together with aspects still to be investigated, can be regarded as constituting a 
discipline of ethorobotics , and that an ethorobotic perspective is a good way of handling 
complexity. 



3. THE INFLUENCE OF NATURE ON THE DIRECTION OF LEGGED ROBOT 
RESEARCH 

In the early days of research on artificial legged locomotion, there were two main themes; 
first, prosthetic and orthotic aids to human walking, together with man-amplifiers or powered 
exoskeletons; and, second, legged vehicles for rough-terrain transport. 

The first theme is so closely bound up with human locomotion that it hardly makes sense to 
speak of lessons from nature: the whole enterprise is one of assisting nature. 

In the second, however, we see a direct attempt to imitate nature in machines. The idea that 
intermittent support by the cyclic motions of a set of levers is a feasible and appropriate 
method of locomotion is itself the first and most crucial, if largely unconscious, observation 
from nature. It was observations of the high mobility of legged animals which led to the view 
that legged machines could have advantages over wheeled ones; the main advantages being 
the ability to travel on discontinuous surfaces, and the superior efficiency of legs compared 
with wheels on soft ground. 

One of the most decisive influences nature has had is in the choice of leg number and 
geometry. There is no theoretical upper limit to the number of legs, or on where they should 
be placed, or on how many joints they should have. But since evolution selects only viable 
designs (although it may miss some, as it missed the wheel), nature provides us with 
examples which we know can be effective. We are all familiar with examples of impressive 
legged locomotion configurations: the horse, the beetle, the spider, ourselves. The direction 
taken by the pioneers of research on legged robots was largely governed by these examples. 

Once a mechanical configuration has been chosen, the next issues are the interlinked ones 
of balance and gait. Here again, the possibilities are infinite but nature has given us examples 
of effective gaits, such as the alternating tripod gait of hexapods, and the trot and gallop in 
quadrupeds. Of course, once these have been identified in nature it is possible to formulate a 
systematic theory, in some cases finding that by changing one or two parameters it is possible 
to generate a series of apparently different gaits from a single formula. 

These influences are readily apparent in the design of the first walking robots. Even 
machines radically different from anything in nature, such as Raibert’s monopod (Raibert 
1986), have been largely inspired by a desire to understand and eventually imitate natural 
locomotion. In this case the apparent departure from natural models is actually a device for 
studying natural behaviours such as hopping and running in a simplified form. 

Rather than attempt an exhaustive list of the influences of biology on machine design, I 
shall illustrate the theme by describing the way in which a specific machine incorporated 
some lessons from nature, then go on to deal with some under-exploited mechanisms. 




195 



4. INFLUENCES ON THE HYDRAULIC TEST VEHICLE 

This machine, which has been under development in the Department of Mechanical 
Engineering at Edinburgh University for about three years, was intended as a general-purpose 
test rig to allow us to study various issues on a scale relevant to the design of large vehicles, 
while at the same time keeping costs to a minimum. It is not particularly original in concept: 
the leg design uses a pantograph derived from the Ohio State University Adaptive Suspension 
Vehicle (Song and Waldron 1989), and its speed and agility are likely to remain poorer than 
that of many existing machines. However, since it uses relatively cheap components and 
simple control methods, it may pave the way for an extension of the areas in which legged 
robots will be economic. 

The general arrangement of the machine is shown in Figure 1. The design was influenced 
by nature in several respects. First, the structure is like a vertebrate, based on a narrow spine 
rather than a space frame. This structure was chosen because of the extra agility conferred on 
an animal by bending the spine. We envisaged that one day it might be desirable to introduce 
joints into the body structure, and this is easier with a spine than with a framework. 

A second strong influence was in the choice of gait. The machine has been operated with 
four legs and with six. As a quadruped, lateral stability is a significant problem in a narrow 
and not very rigid machine. Animal gaits were extensively studied in an attempt to find the 
most suitable. The ideal gait for such a machine, in terms of speed, is the trot, but this is 
difficult to stabilize at low speeds. What is needed is a smooth transition from a crawl (in 
which only one leg at a time is lifted) to a trot. The work reported in Stuart et al (1973) shows 
this is possible, and occurs in cats. A gait with these characteristics is shown in Figure 2. 

In hexapod mode the machine’s main gait is the alternating tripod, well known in insects; 
see, for example, Graham (1985), or, better still, go out and watch a beetle. 

Another way in which observations from nature influenced design was in the leg 
disposition in hexapod mode. The question is, in going from four legs to six, where should the 
two extra legs be located? This can be decided by noting that arthropods with many legs face 
the problem of interference between adjacent legs. One solution, common for leg numbers 
between six and perhaps twenty (but not seen with the very large leg numbers found in 
millipedes), is to offset the legs so they can overlap each other. (See Figure 3; also 
Manton(1977)). In the case of hexapods this has the additional advantage that when standing 
on a tripod of legs the load distribution is made more even, so the more distal sections of the 
middle legs need be no stronger than those of the end ones. Further, lateral stability is 
improved because the base of support is wider (Figure 4). 

Other design features based on nature have been considered for this vehicle: for 
example, primitive experiments were made on the use of a ‘head’ and ‘tail’ as balancing aids, 
and of wide feet with locking ankle joints to improve lateral stability. These principles are not 
currently implemented, and are described in the next section. 

5. FURTHER LESSONS FROM NATURE 

This section lists some biological mechanisms which have yet to be exploited by robot 
designers to any extent. 




196 



Balancing aids 

Machines such as Ohio State University’s ASV can to some extent adapt to sloping ground 
by adjusting differential leg height or average leg angle, but this may have undesirable 
consequences such as limiting step length. An alternative or supplementary method of 
balancing on slopes is to move the centre of mass of the machine by extending, swivelling or 
bending extra appendages. The problems of balance in the sagittal and frontal planes are 
similar, but they differ in that long members sticking out sideways are undesirable as they 
would prevent passage through narrow gaps. Figure 5 shows how a weight-carrying flexible 
appendage at each end of the vehicle could be used to move the centre of mass both laterally 
and longitudinally. Such balancing devices might also be useful for obstacle crossing, by 
allowing normally unstable postures (Figure 6). 

Substrate reaction 

A flexible or jointed neck and tail could also be used to push against the ground. If a 
machine were in danger of overbalancing it could brace an appendage against the ground 
while seeking a new foothold; if the appendage were prehensile or equipped with a gripper (as 
the arms might be on some robots) it could perhaps be used to pull the robot up a slope or out 
of a ditch. A more dramatic use would be returning a robot to its feet if it fell over completely 
(Figure 7). 

Crab leg arrangement 

A six- or eight-legged machine with a row of three or four legs at each end, instead of a 
row down each side as is usual, would be relatively narrow and able to pass through confined 
spaces. The Odex I robot made by Odetics Inc. can adopt this configuration for going through 
doorways. This arrangement is similar to that of a crab when it walks sideways. Crabs 
presumably walk sideways for the same reason; given their body shape, walking sideways lets 
them pass through narrow apertures. 

Spine flexing 

All mobile robots so far have had a rigid body (apart from one or two articulated wheeled 
vehicles proposed for planetary exploration). All vertebrates, however, can bend the body to 
some degree. An obvious advantage is the ability to bend round obstacles, snakes being the 
ultimate example of this, but there are other advantages. Bending the body can aid in getting 
up from a lying position, it can help in reaching with the head or arms, and it can increase the 
speed of locomotion. The supreme example of this last use is the cheetah. On each stride the 
spine, by bending, effectively changes its length, adding to the stride. In addition, its bending 
effectively introduces another joint at the hip and shoulder, allowing the leg to swing through 
a greater angle, and again increasing the stride length. Something similar happens in 
salamanders, but with the spine flexing in a horizontal plane (Hildebrand 1974). 

Folding Feet 

For robots which walk slowly enough for there to be significant phases relying on static 
stability, the size of the feet, both laterally and lengthwise, can be important. This is 
particularly so for bipeds. 




197 



Bipedal animals, including man, tend to have relatively large feet In particular, birds such 
as ducks clearly use the width of the foot with outspread toes as a means to stability. The 
problem is that such wide feet tend to interfere with each other during walking. 

The solution some birds adopt is to fold up the foot when it is raised on every step so it is 
compact as it passes the supporting leg; the toes are then spread again as the foot is lowered. 
This simple mechanism would seem quite suitable for some legged robots (Figure 8). 

Conformal Locomotion 

Observation of animals in confined spaces suggests that one of the reasons they need so 
much less space than vehicles such as automated guided vehicles (AGYs) is that they can 
operate in contact with obstructions, and even gain support and propulsion from them. The 
ultimate example is the snake, which propels itself entirely by reacting its body against the 
environment. 

A conformal vehicle is one which conforms to the geometry of the fixed objects it 
encounters, by sensing them and altering its path and often its own shape. Such a shape 
change may vary, from simply altering the track width of a wheeled vehicle, to the sinuous 
bending of a snake. (At least one robot has been made using the snake principle: see Hirose 
and Umetani (1977)). An intermediate case is that of a legged vehicle whose legs can be 
folded or extended to get them past obstacles. 

A conformal vehicle could squeeze through a narrow gap; it could lean on walls if this 
helped (a method not unknown in humans in conditions of reduced sensorimotor ability); and 
make efficient use of the space in tight comers. 

Conformability rests on a combination of principles: 

The use of support from walls; 

The ability to brush past or along a surface using a 
‘fur’ of touch or proximity sensors; 

The ability to back-track a sequence of actions when 
blocked and try again; 

A reflex-like folding of the legs when passing an 
obstacle. 

There would seem to be a large field of research open here. 

Soil Testing 

When one walks on a beach or an icy pavement one tests certain properties of the substrate 
on every step, although usually without being very conscious of it. These tests, mainly of 
sinkage and friction, enable one to modify one’s gait and route so as to minimise effort or 
reduce the danger of bogging down or slipping. 

It would be useful for a walking robot to similarly sense the properties of the soil on each 
step. The most obvious properties are sinkage and friction. 

Relationships between depth of sinkage and pressure have been derived for various soil 
models in studies of building foundations and the mobility of wheeled and tracked vehicles. 
Using such a relationship, with experimentally determined parameters, it would be possible to 
take the sinkage with a known foot loading and predict the sinkage under other loadings; or a 
cruder analysis might suffice. In any case, some means of measuring foot sinkage is needed. 




198 



We have experimented with a foot-mounted penetrometer, but a better method is to fit each 
foot with pads which rest lightly on the soil and provide a reference for measuring depth (and 
rate) of sinkage on each step (see Figure 9). 

Finding how slippery a surface is can be done in several ways. A foot could be equipped 
with a probe to be dragged a short way across the surface, or with a small powered wheel, to 
directly measure the frictional forces at a certain load. Alternatively, the forces developed as 
one leg moved relative to another could be measured. Thirdly, for dynamic gaits the robot 
could sense the patterns of force, acceleration and displacement as each foot touched down: 
sudden horizontal movements would indicate slipping. 



6. ETHOROBOTICS 

Much research on mobile robots has consciously or, more often unconsciously, drawn on 
ideas arising from a study of animal behaviour. Usually these ideas have not been taken from 
the ethological literature but represent a rather casual view of the way animals behave. The 
term ‘ethorobotics* is meant to denote the explicit incorporation of ethological observations 
into robot control systems. 

What specific contributions might the ethorobotic perspective make? It might begin with 
the issue of autonomy, survival and appropriate behaviour. A comparison of different animals 
leads to the (rather obvious) conclusion that autonomy is not the same thing as intelligence. 
While most research in artificial intelligence (AI) has been directed towards intelligence, a 
mobile robot also needs a high degree of autonomy. What is striking about animals is that 
even simple animals such as woodlice, with very little intelligence in the sense of reasoning 
ability, and with relatively simple senses, are highly autonomous: they need nobody to 
initialize their navigation systems, or turn them on, or get them out of comers. One of the 
aims of the ethological approach to robotics should be to refine the concept of autonomy and 
to devise ways of conferring it on control structures. (For example, how important is it for an 
agent to be able to set its own goals?) 

Autonomy is also bound up the concept of appropriate behaviour: the ability to adjust to 
circumstances even if unfamiliar or unintelligible. A robot should never be found stuck in a 
comer with its wheels grinding away because a sensor has failed. A corollary to this approach 
is that a robot should know when it is being interfered with, for example by being relocated, 
or turned over, or damaged. It may be unable to continue as normal, but at least it should be 
able to function in some organized way, if only by shutting itself down and waiting for 
rescue. 

Some research in robotics, for example that of Brooks (1986), does address these issues. 

A second example of the ethorobotic approach is the use of simple behavioural 
mechanisms such as taxes and tropisms for getting about in a disordered environment. Many 
invertebrates use such ways of finding food or shelter or a mate. These methods rely on 
detecting the intensity or gradient or direction of some environmental property, or on 
searching for an easily recognized marker. In a taxis an animal heads towards or away from a 
stimulus, either comparing the signal from two sensors or by using body movement to scan 
with a single sensor. In a kinesis rate of turning, or speed of locomotion, is governed by the 
intensity of a stimulus. A robot might search for its home base or for some target using such 




199 



methods; the target might be an unexploded mine, a gas leak, a weed or pest, a fragment of a 
crashed aircraft... Sensing methods could include magnetic and chemical sensors as well as 
touch and vision. 

A related idea is the sensing of multiple properties of the environment for navigation. 
Animals without imaging abilities must identify places mainly by sensing a highly specific 
property such as smell, or a combination of properties. Some properties which could be 
sensed by a robot are: 

Soil temperature, colour, hardness, texture, humidity 

Air temperature, pressure and humidity 

Wind speed and direction 

Ground slope and roughness 

Ambient light level, colour, polarization 

Ambient sound level and spectrum 

Magnetic field strength and direction 

Range to nearby objects 

Smell (chemical concentration in air or soil) 

Vegetation type. 

In a reasonably varied environment each place of interest may well have a unique signature 
if a vector of all these measurements is computed. Such signatures, together with dead 
reckoning, could provide the basis for a navigation system within a robot’s habitual territory 
of operation. 

A further aspect of this concept is the marking by an animal of places it wishes to identify, 
by scent or other means. A robot might plant beacons (acoustic, optical, radio or chemical) to 
which it could subsequently refer. 

Other subjects, such as the scheduling of behaviour, on which there is already much 
research in AI, could also be regarded as within the domain of ethorobotics; yet other 
abilities, such as the use of imitation in acquiring skills, have not even begun to be explored 
for robotics. 

7. BIBLIOGRAPHY 

There is a substantial zoological literature relevant to robotics, and an adequate review 
would need a separate paper. This section just lists some sources the author has found 
particularly useful. 

The mechanics and energetics of vertebrate locomotion have been extensively discussed by 
R. McNeill Alexander (whose publications are too numerous to list), T. A. McMahon and M. 
Hildebrand (see references). Some extended treatments of locomotion in invertebrates are to 
be found in Manton(1977), Graham (1985), Stein et al. (1973) and Herreid and Fourtner. A 
glance through the references in these works will reveal the journals where continuing 
research results are published. 

There is also a massive literature on ethology. Among the works the author has found 
helpful are Bateson and Hinde (1976), Hinde (1970) and Camhi (1984). 

For the reader with an interest in legged robots there are now several books on the subject 
as well as the regular journal and conference publications. Four books of general interest are 
Raibert (1986), Song and Waldron (1989), Sutherland (1983), and Todd (1985). 




200 



References 

Alexander, R.McN. Animal Mechanics. Blackwell Scientific 1983 
Alexander, R.McN. Elastic Mechanisms in Animal Movement Cambridge University 
Press 1988 

Bateson, P.P.G.; Hinde, R.A. Growing Points in Ethology. Cambridge University Press 
1976 

Brooks, R. A. A robust layered control system for a mobile robot. IEEE J. Robotics and 
Automation, RA-2(1), 14-23 (March 1986) 

Camhi, J.M. Neuroethology. Sinauer Associates Inc. 1984 

Graham, D. Pattern and control of walking in insects. Advances in Insect Physiology, 18, 
31-140(1985) 

Herreid, C.F.; Fourtner, C.R. (eds.) Locomotion and Energetics in Arthropods. Plenum 
Press 

Hildebrand, M. Motions of the running cheetah and horse. J. Mammalogy, 40(4), 
481-495 (1959) 

Hildebrand, M. Analysis of Vertebrate Structure. John Wiley and Sons 1974 
Hildebrand, M.; Bramble, D.M.; Liem, K.F.; Wake, D.B. (eds.) Functional Vertebrate 
Morphology. Belknap Harvard 1985 
Hinde, R.A. Animal Behaviour. 2nd edition. McGraw-Hill 1970 
Hirose, S.; Umetani, Y. Kinematic control of an active cord mechanism with tactile 
sensors. In: 2nd CISM-IFToMM Symposium on Theory and Practice of Robots and 
Manipulators, pp. 241-252. Elsevier 1977 
Manton, S.M. The Arthropoda. Clarendon Press 1977 
McMahon, T.A. Mechanics of locomotion. IJRR, 3(2), 4-28 (1984) 

Raibert, M.H. Legged Robots that Balance. MIT Press 1986 
Song, S.-M.; Waldron, K.J.; Machines that Walk. MIT Press 1989 
Stein, R.B.; Pearson, K.G.; Smith, R.S.; Redford, J.B. (eds.) Control of Posture and 
Locomotion. Plenum Press 1973 

Stuart, D.G.; Withey, T.P.; Wetzel, M.C.; Goslow, G.E. (Jr.) Time constraints for inter- 
limb co-ordination in the cat during unrestrained locomotion. In: Stein et al. (eds.), 
pp. 537-560 

Sutherland, I.E. A Walking Robot. The Marcian Chronicles Inc. 1983 

Todd, D. J. Walking Machines: An Introduction to Legged Robots. Kogan Page 1985 




201 




Figure Is Hydraulic test vehicle 



202 





L Front 






L Back 






R Back 






R Front 






Three stages of a general-purpose 
gain. The horizontal axis represents time 
within one stepping cycle; the length of a 
bar represents the fraction of the cycle for 
which that foot is on the ground. The three 
cases shown are: 

a. Duty factor 1/2. The trot. 

b. Duty factor 3/4. The quadruped crawl. 

c. Duty factor 5/8. An intermediate gait: 
periods of 2- and 3-leg support alternate. 



Figure 2. 




203 




Figure 3: Eight-legged and six-legged arthropods 
in which the middle legs overlap the end ones 




Figure 5: The use of a flexible neck and tail for 
both lateral and longitudinal stability on slopes 




205 




The use of balancing aids in crossing an obstacle . 

The head and tail masses move the centre of mass of the machine back 
while the front end is unsupported ; then during the second half of the manoeuvre 
they swing forward to balance the unsupported rear of the machine. 

Figure 6. 




Figure 7: The use of neck- and tail-like appendages 
to restore an overturned robot to its feet 



2 ! 




Figure 8: A wide foot which can fold to pass the 
other leg 



Ko>1 



D L 




Figure 9: Two ways of measuring foot sinkage: left 
a spring-loaded penetration probe attached to the 
foot; right, a reference plate which rests lightly 
on the soil so the sinkage of the main part of the 
foot can be measured. 



Mach 

"the 



.me — 
Model 



W 
CXT€ 
O f 



Iking 
at ion of 
Moti ora. 



A.Morecki and T.Zielinska 
Institute of Aircraft Engineering and 
Applied Mechanics, Warsaw University 
of Technology, A1 .Niepodleglosci 222, 
00-665 Warszawa, Poland 



Introduction 

The Robotics and Technical Biomechanics Group 
at the Warsaw University of technology has 
been working on a quadruped walking machine 
for the last few years. The construction of 
the machine ' s legs imitates the structure of 
digitigrade mammals limb (e.g. horse, rabbit - 
Fig. 1, Fig. 2, Fig. 3). 

The machine is of static walker type [2,3], 
which is to say that in every instant the 
projection of its centre of gravity onto the 
ground is inside the support polygon. This way 
of moving is possible under an assumption that 
the machine moves slowly (the speed of motion 
does not exceed 5kmph) and the rate of change 
of velocity is not rapid, such a motion is 
easy to execute, such that most of the 
currently constructed machines are of the 
static walker type. 

After the structure of the machine's legs has 
been assumed, the next task was to find the 
gait (a fixed sequence of leg transfers) that 
would assure the static stability [1,4] of the 
device . 




208 




Fig. 1. Structure of the animal’s legs 
G - centre of gravity of the body, 

T^, T 2 - major muscles driving the leg. 

1. The Possible Gaits 

The number of potentially possible gaits is 
a greater as the number of legs of the 
walking machine grows. A. B. Bessonov and 
N.N.Umnov supplied the following formula 
describing the number of gaits as a function 
of the number k of legs [4] : 

N = 6 [ i k - C 1 ( i-1 ) k + C 2 ( i-2 ) k + ... 
i=l i i 

+ (-l) 1 " 1 C* -1 l k ]/i (1) 

where : . 

denotes the number of combinations ( x ) 
i 3 

The number N, computed in such a way. 



209 



describes all possible gaits, even such where 
all the legs are above the ground (jump). 




2 





211 



N 1 = (k! /( 1 I (k-1) ! ) + k! /( (k-1 ) ! 1 ! )+ 

+ k I / ( 2 ! (k-2) I + k!/( (k-1) !2! ) )+ 

- ( k ! / ( 1 ! 1 ! (k-2) ! + k!/(l! (k-2) !1I ) + 

+ k!/( (k-2) ! 1 ! 1 1 ) ) (2) 

The difference: 

N st = N - N 1 (3) 

is equal to the number of gaits in which 
more than two legs are supported [4] . Among 
these gaits statically stable gaits should be 
searched for. For a four- legged machine: 

N = 26, N 1 = 20, N st = 26 - 20 = 6. 

The N s ^ gait of a quadruped walking 
machine are gaits with only three of four- 
support phases. Among these gaits a gait 
recommended for the designed machine was 
looked for. 

2. The Choice of Gait 

The gait parameters are as follows: 

-relative time of leg-ground contact: duty 
factor - constant for each leg during 

the execution of the given sequence of leg 
transfers (during the gait), 

- relative difference in time between support 
instants: relative phase - constant during 
each gait for the pairs of legs placed on 
each side of the body. 

The relative values are computed in relation 
to the gait period 

- the time of the execution of the sequence of 
leg transfers. A formula describing the value 
of the duty factor for the gaits which are 
formed by consecutive three-support and four- 
support phases (crawl) was elaborated: 

r Q = 3/4 + 3M S /(2K + 4M S ) 



(4) 




212 



where K is the length of the virtual step [1] 
and M is the static stability margin 
- [4? - Fig. 4. 

Distinguishing the instants in witch the 
static stability margin is minimum, 
coefficients r-> and T 2 were defined. The 
static stability margin has a minimum value 
when the foreleg-end (foot) shifts in relation 
to the body by the distance r^K measured from 
the position of the last three-support phase. 
Similarly the distance (l-r 2 )K is determined 
for the hind leg - Fig. 5. The foreleg and the 
hind leg placed on the diagonal of the body 
are considered then. 

The values of coefficients r^ and r 2 are 
different for different statically srable 
gaits. 

< 

direction of the 
machine's motion 




Fig. 4. Static stability margin. 




213 



Let us consider the possible gaits (possible 

sequences of leg transfers): 

-sequence 0: left foreleg, right hind leg, 
right foreleg, left hind leg, 

- sequence A: left foreleg, left hind leg, 

right hind leg, right foreleg, 

- sequence B: left foreleg, right foreleg, 

left hind leg, right hind leg, 

-sequence D: left foreleg, left hind leg, 

right foreleg, right hind leg, 

- sequence E: left foreleg, right hind leg, 
left hind leg, right foreleg. 



It is easily noticed that sequences D and E do 
not maintain the static stability (the 
succeeding support triangles do not have 
common points ) . For the order sequences the 
duty factor is represented by formula (4). 

The coefficients r-^ and r 2 differ for 
different gaits: 

- sequence 0: 

t 1 = ( 2K - 2M s )/ 3K, 0.5<r 1 <0.6 (5) 

r 2 = r x + 2M S /K (6) 

- other sequences: 

Tl = (K- 4M g )/ 3K, 0.0 < r x <1/3 (5') 

The following relationships take place: 

- sequence A: r 2 = 0.0 

- sequence B: r 2 = 1.0 - r^ (7) 

sequence C: r 2 = 



0.0 




214 




Fig. 5. Leg-end trajectory relative to the 

body. 

Taking into consideration the constraints 
imposed on the leg motion (maximum step 
length) and finding the relationship between 
the static stability margin and the projection 
of the centre of gravity onto the ground, we 
obtain: 

(S x + (1 - r ± ) K) < x c ' < (S 2 - r 2 K) (8) 

(S 3 + r 2 K) < x c ’ < (S 4 - (1 - r 2 )K) (9) 

where : 

f 

x c <. 0.0 - coordinate of the projection of 
th& machine ' s centre of gravity onto the 
ground [4], 

x c ' = x c - H tan(a) , 




215 



x c - coordinate of the centre of gravity, 

H - height of the machine [4], 
a - slope of the ground. 

S^, S 2 / S 3 , S 4 - coeficients being the 

functions of geometric parameters of the 
machine and the values of quantities 
determining the motion ranges of the legs. 

s i = - V s 2 = D - s p' 

S 3 = -D - S^, S 4 = D - 

Sp, S^. are the sums of the geometric 
parameters of the machine, D is the parameter 
determining the leg range of motion. 

The diagrams presenting the ranges of 
variability of x c for the permissible values 
of the coefficient ri for the gaits 0, A, B, 

C are shown in Fig. 6 . 

Fig. 7 shows the analogous ranges for the 
relationship between x- and r? (inequality 
(9)). The range of variability or x c in the 
gaits A, B and c is displaced towards the 
positive values in relation to the range shown 
in Fig. 7. , 

The displacement of x c towards the positive 
values corresponds to a more step slope of 
terrain while the machine is descending (a < 
0 . 0 ). 

From this and due to the constraints ( 8 ) 
results that the gaits A, B and C can be used 
when the slope of terrain is smaller than the 
slope used with gait 0. The analysis of 
relationship (9) (Fig. 7 a), b)) leads us to 
the cpnclusion that, due to the displacement 
of x. towards the negative values, the gaits 
A ana C (Fig. 7) should be used for the slopes 
of the terrain larger than the slope 
appropriate for the gait 0. The gaits A and C 
restrict the motion capabilities (in respect 
to the permissible terrain slopes) in 
comparison with the gait 0 . 




216 



We have asserted that due to condition (8) the 
gait B should be used for smaller slopes than 
when using gait 0. In the case of different 
machines the gait 0 should be used when the 
machine's centre of gravity is shifted to the 
back of the body and otherwise gait B should 
be used (the length and width of the machines 
have to be the same ) . 

In respect to condition (9) the gaits B and 0 
are comparable. 



a) 




Fig. 6. The range of variability of x 
function of r-. : a) gaits A, B, C, c 

b) gait 0. 



as a 




21 



a) 



ga I ts A, C 




5^+0* 6K S 3 +K 



V 0 * 4 * 



s 



If 



Fig. 7. The range of variability of x as a 
function of r 9 : a) gaits A, B, C, 

b) gait 0. 

As the variability of the coordinate of 
projection of the centre of gravity x c is 
larger for gait 0, this sequence was assumed 
to be the gait for the designed machine. 

This is the only statically stable gait 
observed in the animal (quadruped) world. This 
kind of gait is called the quadruped crawl. 

In this way, when moving very slowly, moves 
e.g. horse, cow, turtle. 



218 



3 . The Choice of the Gait Model 

After the kind of gait (the sequence of 
legs transfers) had been chosen a method of 
its description was elaborated [5]. This 
description gives the relationship between the 
position of each leg and the actual instant of 
motion. The description takes into account 
that the legs of machine can move only in one 
plane. 



The speed of motion is equal to: 

v = ( 4K + 4M S )/ 3T (10) 

where T is the gait period (the time of 
execution of one sequence of leg transfers). 

The length of step K during quadruped crawl 
depends, among others, on the distance of the 
body's centre of gravity from the ground and 
on the slope of the terrain. 

The allowable range of the step length is 
determined by a set of inequalities in the 
following generic form: 



K < f l 


(d, x c . 


H, 


M S' D ' 


DD, L, 


a) 




K < f 2 


(d, x c . 


H, 


M S ' D ' 


DD, L, 


a) 


(11) 




K < f 3 


(L, 


d. Mg) 








where : 















ff, fo, f 3 “ are linear functions of the 
mentioned variables [4], 

d - is distance between the footsteps on the 
diagonal of the body (e.g. between footsteps 
of the left foreleg and right hind leg), d is 
a constant parameter for a given gait and a 
given animal, 

x c - coordinate of the body's centre of 
gravity, 

H - vertical distance of the centre of 




219 



gravity from the ground (height of the 
machine ) , 

D - maximum possible forward/backward shift 
of the leg, 

DD - minimum possible forward/bacward shift 
of the body, 

L - length of the machine's body, 

a - slope of the terrain. 

The step length for all of the legs is equal. 
If we assume that at an instant t_ = 0 the 
leg touched the ground (it was at its maximum 
forward shift position - Fig. 8), then up to 
an constant t (t _< r Q T ) the body moves in 
relation to this leg by a distance: 

- for any of the forelegs 

df = d/2 + x c ' +(2K + 4M g )/3 + vt (12) 

- for any of the hind legs 

d h = L - d/2 + x c ' + (K + 2M g )/3 + vt (13) 

These distances are measured along the 
direction of motion. If at an instant t_ = 0 
the leg was lifted of the ground (it hacr been 
at its maximum backward shift position - Fig. 
8), then up to an instant t (t = (1 - r_)T) 
the leg-end moves in relation to the body by a 
distance S: 

- for any of the forelegs 

S = d/2 + X c ' - (K + 2M s )/3 + 

+ (K + 2M g )t/(T - r 0 T) (14) 

- for any of the hind legs 

S = L - d/2 + x c ' - (K + 2M s )/3 + 

+ (K + 2M s )t/(T - r 0 T) (15) 

At a time t = (1 - r Q )T the leg that has been 




220 



shifted (has been above the ground), should 
touch the ground. At this moment the leg as 
at its maximum forward shift - Fig. 8. 

The relationships (4), (9 -15) are utilised in 
creation of the description of the machine ' s 
gait. 



direction of motion 





leg-end trajectory relative to the body 




Fig. 8. Extremal positions of machine's legs. 

4. Results of Investigations 

The correctness of the relationships 
describing the gait model (quadruped crawl) 
was tested using computer simulation methods. 
In a similar way the method of modelling the 




221 



free gait was tested. We have ascertained that 
the possible motion-tree search is effective 
with respect to the speed of data processing. 

No flaws have been found in the method (for 
all of the tested situations the method 
satisfactory solutions were found). The 
simulation programs were coded in FORTRAN an 
executed on a IBM 360/370 compatible computer. 
The results of simulation trials can 
be found in [4] . 

The program modelling the gait of the machine 
was coded in PASCAL and executed on an IBM/PC 
microcomputer . 



5 . Conclusions 

The proposed method of gait description and 
its parameter determination can be used in the 
case of any statically stable motion. 

In the currently realised investigation of a 
walking machine, the utilisation of the 
results is planned [4] . 

The method of modelling gait described yields 
a relationship between a machine's geometric 
parameters (body length, machine height, the 
range of motion of each leg) and the gait 
parameters. 

These relationships are instrumental in the 
design of a machine realising an assumed type 
of motion or when the types of motion executed 
by a given machine have to be determined. 

The gait description presented in this paper 
is easer to represent in the structure of a 
real control system. The computer simulation 
and last experiments witch real machine proves 
correct the assumed procedure. 

The research was supported by the Polish 
Academy of Sciences 
- pr No. 7.1. 




222 



References 

1. Kumar V., Waldron Gait analysis of 

walking machines for omnidirectional 
locomotion on uneven terrain. From preprint 
submitted for publication, 1988 

2. Morecki A., Jaworek K., Pogorzelski W., 
Zielinska T., Fraczek J., Malczyk G. : 
Robotics system - elephant trunk type 
elastic manipulator combined with a 
quadruped walking machine. Second 
International Conference on Robotics and 
Factories of the Future. San Diego 1987 

3. Raibert M.H.: Legged robots. Communication 
of the ACM. 29(6), 499-514 (1986) 

4. Zielinska T.: Modelling of the quadruped 
walking machine gait (in Polish). PhD 
Dissertation (advisor Morecki A.). Warsaw 
1986 

5. Morecki A., Busko Z., Jaworek K., 

Pogorzelski W., Zielinska T., Fraczek J., 
Malczyk G.: Robotics system: elastic 

manipulator combined with a quadruped 
walking machine. Seventh CISM-IFToMM 
Symposium on Theory and Practice of Robots 
and Manipulators. Udine 1988 




Biped Locomotion by FNS: Control Issues and an ANN Imple- 
mentation 



Gideon F. Inbar 

Department of Electrical Engineering, Technion - Israel Institute of Tech- 
nology, Haifa 32000, Israel. 



Abstract. Muscles paralyzed as a result of spinal injury, may be activated to 
contract with the technique of Functional Neuromuscular Stimulation (FNS). The 
translation of such externally induced contraction into functional movements of 
the lower limbs of paralyzed individuals may be considerably improved by the 
incorporation of a controller into the feedback loop closed around the leg joints. 
Human locomotion is a complex movement that even with a simplified model 
exhibits many degrees of freedom, coupling between the joints, nonlinear charac- 
teristics in single joint gains and overall system dynamics etc. In the present work 
a simplified five segment inverted pendulum model is used to demonstrate a joint 
decoupling scheme that could allow each joint to be controlled independently, 
using a model reference adaptive controller (MR AC). Linearization around verti- 
cal standing position is used for simplification despite the large joints angular dis- 
placement during normal locomotion. 

MRAC implementation in real time, especially for tracking pre-determined 
input signals, can be simplified using an inverse dynamics adaptation. An artifi- 
cial neural network (ANN) implementation of such a scheme, which is based on a 
perceptron like learning scheme, is outlined. The ANN controller’s ability to 
track the parameters of a highly nonlinear system is demonstrated. Thus the ANN 
controller can overcome the nonlinearities involved in both the neuromuscular 
mechanisms and those generated by the mechanical coupling. 

It is shown that the network in the adaptation mode can generalize for various 
inputs, converge from zero initial conditions in less than twenty iterations and 
maintain performance despite the loss of more than 50% of its synaptic connec- 
tions. 



1. Introduction 



Functional neuromuscular stimulation (FNS) is the application of a controlled 
electrical stimulus to the intact peripheral or intramuscular nerve in an attempt to 
replace upper motor neuron control which may be lost due to cerebral stroke, 
brain injury, tumor or traumatic spinal lesion. 

Over the past twenty years several FNS orthoses have been developed for the 
improvement of hand function and gait. Most are open loop while the present 




224 



work deals with a closed loop system. Recently various FNS systems have been 
developed to restrenglhen the paralyzed muscles of the lower legs and to enable 
certain paraplegics to stand and execute a simple gait pattern [1,2,3]. After all 
these years and great effort in many laboratories and countries the solution to 
human gait control through FNS is only a distant reality. What are the problems 
posed by the human body and by the task that we attempt to perform? 

There are two different classes of problems: a. Those posed by the task, i.e. to 
generate a locomotion pattern in the desired progression direction and to simul- 
taneously maintain body stability in space, b. Problems posed by the anatomy 
and physiology of the body. The latter consist of nonlinearilies in the individual 
muscle gain characteristics, nonstationarilies in static and dynamic characteristics 
in the neuromuscular systems, anatomical and physiological constraints that must 
be observed - such as the range of motion or torque capabilities, time constants 
etc. - mechanical coupling of joints through muscles that work across more than 
one joint, heavy noise in the frequency band of operation due to spasticity, etc. 

To overcome the problems posed by the human body, specified here as class 
(b), we have proposed to use adaptive controllers [4,5], and some modified con- 
troller to cope with the system noise [6,7]. In the above references there is a com- 
plete outline of the system used, the stimulation parameters, characterization of 
the static and dynamic properties of the neuromuscular systems used etc. The 
individual joint controller synthesis is therefore not discussed in the present paper. 

In the present paper the more general problem of global controller design is 
addressed. A brief discussion of biped locomotion model is outlined, mainly in 
order to emphasize the complexity of the problem at hand. 

The first stage of controller synthesis consists of obtaining nominal programmed 
control inputs from the inverse model of the biped system. The inverse model 
was given as reference angular trajectories - nominal trajectories - that were 
measured from healthy slowly walking individuals using an electro-goniometer 
system [8]. This stage amounted essentially to synthesizing open loop control 
inputs utilizing complete system dynamics, but assuming no perturbations to the 
system. 

The second stage in the design process consists of developing a controller capa- 
ble of tracking the nominal trajectories. A brief review is given of the design of 
the state feedback controller for the single support phase that ensures a globally 
decoupled closed loop system with eigenvalues at desired locations. A somewhat 
similar approach is used for the double support phase. This and the problems 
associated with transitions between control modes are not discussed. 

A simpler control scheme, more compatible with normal CNS (central nervous 
system) control of locomotion, had been suggested by us a long time ago [9, 10]. 
This is basically an inverse dynamic control, where the feedback signals are used 
to update the inverse model of the plant, i.e. of the legs. Inverse dynamic control 
through artificial neural network implementation is outlined. The network archi- 
tecture is given for the special thermometer representation of analog signals [11]. 
The network performance for nonlinear system dynamics and its ability to gen- 
eralize for various input signals is demonstrated. Finally the network robustness 
is studied with up to 70% synaptic failure. 




225 



2. The Biped Model and State Feedback Controller for a Single Leg 
Support Phase 



Why use a biped model? Two reasons justify the effort needed to develop such a 
model. 

1. To investigate problems of postural stability and periodic motion of the human 
body under external control. 2. To use inverse dynamic modeling to generate the 
required control signals based on the given joint trajectories in slow normal 
human walking. The concept of inverse dynamic modeling for adaptive neu- 
romuscular control was introduced by us long ago [9] and its application using 
normal gait [8] for the control of the knee joint in paraplegics was reported 
recently [6]. 

Models of human locomotion have been developed before [12]. Such models 
describe the agonist/antagonist muscle pairs as torque generators acting around a 
single degree of freedom (DOF), hinge type joints. The human walking gait may 
involve 20 DOF, a number which is very difficult to handle even with the 
presently available software tools. We therefore use a simplified compound 
inverted pendulum of 5 DOF (two lower legs, two upper legs and torso), and 
movement constrained to the sagittal plane. Biped locomotion is complicated by 
movement of the structural support point, i.e. the periodic exchange of support 
from one leg to the other. We describe here only one single support cycle, in 
order to be brief, and ignore the switching of phases and the double leg support 
phase. 

2.1 The equations of motion for the single leg support phases 



The single leg support phases are those sections of the locomotion cycle in which 
the biped contacts the ground at one point only. Equations were derived using 
Lagrange’s method for an assemblage of interconnected linkages. The positions 
of the centers of mass of each segment were described with respect to the origin 
of the coordinate system. 

Lagrange’s formula was written as 



d , BT , dT . dV 

M q r = -T-(— ) - + 3— 

dt dq r oq r aq r 



( 1 ) 



where 

Mq r are the generalized forces 

q r are the generalized coordinates 

T is the kinetic energy of the system 
V is the potential energy of the system. 




226 



Due to the complexity of the system (5 generalized coordinates) the equations of 
motion were derived using the symbolic programming language “REDUCE” 
[13]. The five D.O.F. model gave five non-linear equations as functions of the 
angles (0 ( - , l = 1,2 , . . . , 5) segment masses, inertia and length parameters as 
depicted in Fig. 1. The final stage in obtaining the equations of motion involved 
expressing the generalized moments in terms of the virtual work performed by the 
non-conservative actuator torques. This then gave the explicit coupling between 
the D.O.F. expressed as the moment sum at the joints. 

For ease of analysis only linearized equations were used. The non-linear equa- 
tions were thus linearized with respect to the variables 0 4 - , / = 1,2 , . . . , 5, and 
then transformed into the regular stale-space representation below. 

x = Ax + Bu 

r (2) 

y = Cx 



where 

x = state vector = [0j , 0i , 0 2 , 02 , 03 , 63 , 64 , 04 , 65 , 0s] r 



and linearization was performed around the state, 





m ^ 4.55kg 1 1 = 

m 2 = 7.63kg I 2 = 

m 3 = 49.0kg I 3 = 

04 = 7.63kg I 4 = 

m 5 = 4.55kg I 5 = 



0 . 105kg m2 l t = 
0 . 089kgm2 1 2 = 
2.35 kg m 2 1 3 = 
0.089kgm2 1 4 = 
0.1 05kg m2 l s = 



0.502m k t = 0.267m 
0.431m k 2 = 0.247m 
0 . 28 m 

0.431m k 4 = 0. 184m 
0 . 502m k j“* 0 . 235m 



Fig. 1: The five segment biped model. 




227 



x = [0, 0,0, 0,0, 0,0, 0,0, of (3) 

which, since all angles are referenced to the vertical, represents the biped in an 
upright posture. 

The dynamics matrix A, the input matrix B and the output matrix C are shown 
in Fig. 2. (Immediately apparent is the fact that the system, in the stance phase, 
has 6 inputs and 5 outputs). 



0 


1 


0 


0 


0 


0 


0 


0 


0 


0 




0 


0 


0 


0 


0 


0 


394 


0 


-383 


0 


9 


0 


4 


0 


-1 


0 




1 


-1 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 




0 


0 


0 


0 


0 


0 


-484 


0 


533 


0 


-40 


0 


-15 


0 


2 


0 




0 


1 


-1 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 




0 


0 


0 


0 


0 


0 


24 


0 


-83 


0 


49 


0 


10 


0 


-1 


0 


B = 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 




0 


0 


0 


0 


0 


0 


-40 


0 


127 


0 


-42 


0 


-70 


0 


23 


0 




0 


0 


0 


1 


0 


-1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 




0 


0 


0 


0 


0 


0 


22 


0 


-56 


0 


16 


0 


76 


0 


-56 


0 




_ 0 


0 


0 


-1 


0 


0 



1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 



Fig. 2: The A , B and C matrices for the linearized equations of motion of the biped in 
the single leg support phase. 

Furthermore, for each of the two single leg support phases, either column 1 or 5 
will contain all zeros since only one of the ankle actuator torques may be active at 
any one time. The stability of the single leg support phases is seen best from a 
plot of the open-loop pole positions (eigenvalues of the matrix A) in the complex 
s-plane (Fig. 3). These poles are distributed symmetrically about the origin on the 
real and imaginary axes and show that the biped is highly unstable. 

The poles A.] and %i represent the entire structure balancing on the lower leg of 
the support foot. A 3 and X .4 may be attributed to that section of the model balanc- 
ing on the support knee and the outermost poles on the real axis, A 5 and Ag, 
represent the torso balancing on the entire supporting leg. All these poles are 
characteristic of a series of inverted pendulums, each balancing on the one 
below it. The two pairs of complex poles, A 7 , Ag and A 9 , A^j are characteristic 
of regular pendulums i.e. displaying pure oscillatory response, and represent, 




228 



Im 

60. ■ 



40. H 



X6 

x- 



- 20 . 



20. - 

x «x 

x 4 x 2 x 4 

X X 

x* 

X 0 

x 1(J * 



Re 


Im 


30.5 


0 . 


0.16 


0 . 


4.70 


0 . 


0 . 


10 . 1 


0 . 


- 10 . 1 


0 . 


4. 27 


0 . 


- 4.27 


- 4.70 


0 . 


- 0. 16 


0 . 


- 30 . 5 


0 . 



Xi X3 

-x-x- 



— r~ 

20 u 



^5 

“X 

Re 



Fig. 3: The open-loop pole positions for the biped single leg support phase. 



respectively, the thigh and shank of the swing leg. The open-loop system dynam- 
ics are quite obviously unstable and require the addition of feedback for stability. 
An additional feature of the open-loop system is the coupling between each 
D.O.F. 

2.2 Theoretical considerations in designing the closed-loop system 



In linear multivariable (multi-input multi-output) systems, a change in any input 
will usually result in changes in all output variables. It is essential, for the pur- 
poses of simulation and control; to obtain a system in which such interaction 
between controls does not occur. The design objective of non-interacting (decou- 
pled) systems is therefore to obtain a system in which each input affects only one 
output. The primary advantage of such a design is that once non-interacdon is 
achieved, the system is reduced to a number of single-input single-output (SISO) 
channels, or sub-systems, to which well-established controller design techniques 




229 



may be applied. It may be seen therefore that two design problems need to be 
solved. TTie first is concerned with decoupling the high (lCr ) order biped model 
into a number of lower order independent sub-systems. The second problem 
requires designing each independent sub-system such that certain performance 
criteria are satisfied. The design procedure adopted [14, 15] enabled both these 
stages to be combined into a single procedure. 

The design setting is linear state-space with the mathematics consisting chiefly 
of matrix algebra. The state is a vector of size 10x1 and the dynamics matrix, a 
square matrix of dimension 10. The design objective is to obtain a state feedback 
matrix, F, of dimension 5x10, such that complete decoupling and pole-placement 
in each of the independent sub-systems may be achieved. The symbolic program- 
ming language “REDUCE” was employed to perform all the symbolic matrix 
calculations. 

The feedback scheme showing the matrices to be designed, F and G, is shown 
in Fig. 4. It can be shown that there exist a pair [F , G] which will decouple the 
system. The units of the gains in the feedback matrices are Nm /rad . In Fig. 5 is 
shown the closed-loop transfer function matrix R(s ) calculated from, 

R(s ) = C[sl - A - BF] _1 BG (4) 



where 



I is a 10 x 10 identity matrix 

Thus we see that a one-to-one correspondence between the input and the output 
variables has been achieved since R(s) is diagonal with the off-diagonal ele- 
ments equal to zero. 

The elements on the diagonal of R (S ) represent the transfer functions of each 
joint subsystem and, since all off-diagonal elements are zero, show that complete 
decoupling between the subsystems has been achieved. The my t and 
m oi 0'=l , • • • 5) terms in each individual transfer function, and the individual 




y(t) 



Fig. 4: The state feedback decoupling scheme. 




230 



gain 7 0 i may be chosen to locate the closed-loop poles of each joint subsystem 
and so meet any desired performance criteria such as critical damping or fast rise 
time. However freedom in choosing the closed-loop pole positions and gain terms 
is limited by their direct influence on the magnitude of the feedback gains (ele- 
ments in the F matrix) and so must be done with the capabilities of each joint 
actuator in mind. 



Toi 

s 2 -m n s-m oi 



0 



0 



Y02 

s 2 -m 12 s-m 02 



0 

0 



R(*) = 



0 



0 



Y03 

s 2 -m\2,s-moz 



0 

0 

0 



0 



0 



0 



Y04 
s 2 —/n i4 5 



0 

0 

0 

0 



0 



0 



0 



0 



Yo5 

s 2 -mi 5 s-m 05 



Fig. 5: The closed-loop transfer function matrix {R (s )). 



2.3 Postural stability and the initiation of gait 

The design procedure described in the preceding sections was implemented to 
provide both complete decoupling between each of the biped’s D.O.F. and arbi- 
trary closed-loop pole-placement. In the following, a brief description of the 
selected closed-loop dynamics is given. 

It was decided to maintain the poles representing the swinging leg’s dynamics 
on the imaginary (j 0 )) axis but to relocate them slightly so as to match more 
closely the stride length, l g , and velocity , of a slow periodic gait. The calcu- 
lated positions were Xq , X% (thigh dynamics) = ± j 6 and ^,9 >^io (shank 
dynamics) = ± j 10. Since the objective was to simulate paraplegic walking, it 
was decided to shift the poles representing the torso dynamics to the far left hand 
side of the s-plane in order to rigidly stabilize the upper body in an upright posi- 
tion. This situation resembles closely the state of the upper body in F.N.S. 
assisted standing of paraplegics. The chosen positions were A, 5 ,A 6 = — 10 
which ensured that 63 — 0. An F.N.S. assisted, standing paraplegic may intro- 
duce disturbances into his dynamic stability by, for example, moving the positions 
of his arms on supporting parallel bars. It is possible to model these instabilities 
by inputting random noise sequences to the torso sub-system. 

The supporting leg sub-system dynamics were chosen so as to provide optimum 
damping and reasonably fast correction to angular deviations. To this end, the 




231 



knee sub-system poles X 3 , X 4 were placed at — 3 ±y 3 while those of the ankle 
Xj , % 2 . were placed at —10 and 0 in order to ensure zero steady-state error for 
inputs up to second-order (ramp). It should be emphasized that because the sym- 
bolic form of the system design equations for the matrices F and G was retained, 
any change in desired closed-loop system dynamics may be very quickly imple- 
mented by solving the appropriate quadratic equations in R(s ) (Fig. 5). 

Another possible approach to locomotion design involves maintaining the sup- 
port leg sub-system unstable. This then makes the entire single leg support phase 
unstable thus ensuring that the biped will fall forward if given the correct initial 
conditions. By controlling the swinging leg’s trajectory, the biped’s fall may be 
“arrested” allowing the double support phase controllers to take over dynamic 
control of the biped. 

Once the system is decoupled, the individual joint control system may be fed 
with the right vector of signals using the inverse dynamics approach, in which 
slow gait angular trajectories are fed to the joint plant inverse dynamics. Plant 
inverse dynamics are then adapted, with those of the controller, based on joint 
feedback signals. An ANN implementation is described next. 



3. Control by Inverse Dynamics Adaptation and an Artificial Neural 
Network Implementation 



Control through inverse dynamics is a rather simple concept [9]. Given that the 
requested system response is given by a reference model, T(S), when the CNS 
supply to it an input signal, U ( S ), find the signal to be supplied to the system, 
(neuromuscular dynamics), G (S ), so as to have the same response L(S) 

L(S) = T(S) • U(S) 



The system response to a similar input is 



L'(S) = G(S)-U(S)*L(S) 



A signal conditioner is needed in order to generate a new signal U'(S) such 
that 

c/ ' ( 5 )= r^' i/( ' s) 

G(A) 

It is clear then that if U '(S ) is applied to G (S ) its output will be the desired one 

U\S)-G{S) = U(S)-^--G(S) = U(S)-T(S)=L(S) 

G(i) 

It is naturally necessary that both T(S) and G (S ) be well known and that 
T (S )/G (5 ) define a realizable system. It is also obvious that for the case of 
tracking T(S) = 1 and the signal conditioner is the “inverse dynamics” of the 




232 



plant Adaptation by inverse dynamics and its implementation is discussed for 
example in Widrow & Steams [20]. 

We will not elaborate this point any further except stressing the ample evidence 
that exists today, and that existed long ago [10] that the CNS has the feedback 
information necessary in order to determine the plant dynamics. 

It was previously demonstrated that the plant at hand is of a high order, non- 
linear and with changing dynamics. Next we examine whether an ANN can iden- 
tify such a system’s inverse dynamics in order to generate the needed control sig- 
nal. 



3.1 ANN architecture for inverse dynamic adaptation 



The artificial neural network (ANN) that is used for the inverse dynamic adapta- 
tion is not detailed here [11 and 17] except to mention that it is based on a mono- 
layered network of binary neurons all being fed by the same inputs. The general 
block diagram of an inverse dynamic controller is shown in Fig. 6 where the 
adaptive inverse model, both in the identification loop and in the feedforward 
pathway is performed by the ANN, as shown in Fig. 7. The learning algorithm 
minimizes explicitly the errors in the output of the NN - defined as the difference 
between the discretized training signal vector, X tr , and the output of the NN vec- 
tor driven by the discretized output of the plant. More details of the adaptation 
learning mode is shown in Fig. 7(a). The NN consists of N binary neurons, N 
being the number of quantization levels chosen to represent the analog value of 
the signals in the system. 




Inverse coodei weights 



Fig. 6: A block diagram of the inverse model adaptation scheme. 




233 



Using vector notation the input-output of these neurons can be written as 

U i (n) = W i -I(n)-T i ; X(n) = g { U{n) } (5) 

where is a row vector whose j -th component corresponds to the strength 
of synaptic connection between the j -th entry of the binary input state vector at 
time n , I (n ), to the i -th neuron, T,- is the threshold of the i -th neuron, X is the 
output state vector, U is the post-synaptic potential vector and g ( • ) is applied 
on an element by element basis. 

I in) consists of a set of delayed inputs, Y (n) • • • , that evolve from the 
unknown plant driven by the training signal and the delayed outputs, 
X(n— 1) • • • , of the NN itself, as seen in Fig. 7(a) — Y (n ) here is C (t ) of Fig. 
6. The feedback and delays are essential properties of this scheme that enable it to 
model the unknown discrete dynamic systems. The number of delays should esti- 
mate the order of the plant, and this is the only information about the plant needed 
to be known a priori - or it can be estimated successively by starting with a larger 
number of delays. 



The learning algorithm: The problem is defined as finding the synaptic 
weights, Wy , and thresholds, T ( - , that realize the mapping of the input training 
vector I tr (n ) to the NN output vector X [r (n ) , n — \ , . . . , L . One of the 
ways to solve the problem is by the iterative learning algorithm based on the per- 
ceptron learning scheme [17]. 

1. Initialization: Choose arbitrary (0) , T, ( 0). 




(a) 



Fig. 7(a): The ANN architecture in adaptation mode. 




234 




(b) 

Fig. 7(b): The ANN architecture in control mode. 

2. The k -th iteration: Measure for each I tr (n ) , n = 1 , . . . , L the output 



X i (n) = g{W i {k)-I tr {n)-T i } (6) 

Define the error 

E i (n)=X tri (n)-X i (n) ; i = \ , . . . , N (7) 

Update the weights and thresholds 

AWi =T1 Ei -I tr (n) (8) 

AT,- = -T| (9) 



T| is the step size of the algorithm. Note that the parameters W t j and T,- are 
being updated L times during each iteration, where L is the number of transitions 
in the training set It can be shown that this algorithm converges if a solution 
exists [17] or when it does not exist a gradient descent search is performed to 
minimize a certain cost function [18]. 

Signal representation: A thermometer presentation of the signals is used 
in the present work. Signal representation in an ANN is an important 
issue. The thermometer representation, or threshold decomposition 
representation [21], was selected due to its distributive characteristics, i.e. 




235 



the number of bits equals the number of quantization levels (analog lev- 
els). It has the additional important property that each bit in this presenta- 
tion has the same weight - not like the usual binary or “mother cell” 
representations. This allows for a simple self-correction network to be 
incorporated in the control mode, as shown in Fig. 7(b). 



3.2 The control mode with ANN 



In the control mode the ANN act as an inverse of the plant. The learning of the 
plant by the ANN is carried out with a training signal and it is therefore impera- 
tive that the network is able to generalize, or learn to respond correctly, to input 
sequences it has not especially been trained for. Because of the particular signal 
representation used it is possible to improve the networks generalization ability by 
introducing a pre-programmed error-correcting network. Such a network can be a 
content addressable memory (CAM) of the Hopfield type [15], where the correct 
thermometer representation binary vectors are the stable states. Instead a feedfor- 
ward layered NN programmed to act as a Hamming net [15] was selected. Each 
time the NN output is not one of the thermometer binary vector representations, 
the Hamming network will produce a valid thermometer representation vector that 
is the closest to the output state vector in the Hamming distance sense. The sys- 
tem in a control mode is shown in Fig. 7(b). 



Simulation results: In a series of computer simulations the learning and gen- 
eralization abilities of the NN model identifying the inverse dynamics of linear 
and nonlinear plants were examined. A nonlinear plant dynamics described by 

y (n ) = [0.7y (n -1) - 0.75y (n -2)] 3 - x\n ) (10) 

was used to show the performance of the system. N , the number of quantization 
levels and number of neurons, was selected to be 31. The length of the training 
signal, a white noise evenly distributed random noise, was 200. The improvement 
in tracking the random training signal with an increased number of learning itera- 
tions is shown in Fig. 8. In (a) the training sequence is shown in a bold line and 
the ANN output before training is shown in the dotted line. In (b) the dotted out- 
put is achieved after 20 training iterations. The generalization power of the net- 
work is shown in (c) where the network is tracking a new random sequence. The 
ability of the network to track a periodic signal is shown in (d). The network was 
checked for robustness by randomly eliminating synapses (letting a certain per- 
centage of W's go to zero). With 30% synaptic loss the network performed well 
(the generalization errors doubled), however it almost regained its initial perfor- 
mance after retraining. 

The complexity of the plant should be kept in mind when judging the results. 
The ANN architecture proposed here was shown to have two important properties, 
i.e. ability to generalize and robustiveness. This architecture can be easily imple- 
mented in a special purpose hardware in order to function in real time. The major 




236 




Fig. 8: Performance of an ANN with 31 neurons, two delays on the external input and 

one on the state vector. Length of the training set is 200 (see text). 

(a) Before training, (b) After 20 iterations, (c) Generalization with a random 
sequence, (d) Tracking of periodic signals. 

draw back of this architecture is the separation between the learning and identifi- 
cation phase and the control phase. This separation makes also for the inferior 
tracking ability. What is needed is a scheme that will combine the two phases so 
that the learning will be carried out simultaneously with control. In such a scheme 
the ANN synaptic weights will be continuously updated to fit the load and input 
signal. This will improve tracking accuracy, especially in the face of changing 
plant parameters. Such a scheme has recently been implemented [21]. 

4. Conclusions 

Walking with an FNS orthotic system was shown here to pose some major control 
problems, i.e. complexity in dimensionality and strong interaction between joints, 
nonlinearities and nonstationarities in system parameters. The decoupling scheme 
proposed here works well in theory, however in reality a complete and continuous 
decoupling is hard to achieve due to the nonstationarities that the system exhibits. 
The ANN adaptive scheme presented here can cope with the nonlinearities of the 
system and long term nonstationarities, and thus provides a solution to some of the 
inherent difficulties in controling locomotion by FNS. 




237 



Acknowledgement. This work was supported by The Foundation for Research 
in Electronics, Computers and Communication, Administered by The Israel 
Academy of Science and Humanities. 

References 



1. Kralj, A., Bajd, T., Turk, R., Krajnik, J. and Benko, H.: “Gait Restoration in Para- 
plegic Patients: A Feasibility Demonstration using Multichannel Surface Electrode 
F.E.S.”, Journal of Rehabilitation Research and Development , Vol. 20, No. 1, pp. 
3-20, Veterans Administration, 1983. 

2. Braun, Z., Mizrahi, J., Najenson, T. and Graupe, D.: “Activation of Paraplegic 
Patients by Functional Electrical Stimulation: Training and Biomechanical Evalua- 
tion’’, Scand.J. Rehab . Med . Supply Vol. 12, pp. 93-101, 1985. 

3. Chizeck, H.J., Lalonde, R., Chang, C.W., Rosenthal, J.A. and Marsolais,E.B.: “Per- 
formance of a Closed-Loop Controller for Electrically Stimulated Standing in 
Paralysed Patients’ ’, Proc. 8th Am. RESNA Corf., Memphis, Tennessee, 1985. 

4. Goodwin, G.C. and Sin, K.W.: Adaptive Filtering Prediction and Control, Prentice- 
Hall Inc., 1984. 

5. Allin, J. and Inbar, G.F.: “FNS Control Schemes for the Upper Limb* ’, IEEE Trans. 
Biomed. Eng., Vol. BME-33, No. 9, pp. 809-817, September 1986. 

6. Hatwell, M.S., Oderkerk, B .J. and Inbar, G.F.: “A Model Reference Adaptive Con- 
troller to Control the Knee Joint of Paraplegics’ ’, ICCON , April, Jerusalem, 1989. 

7. Hatwell, M.S., Oderkerk, B.J. and Inbar, G.F.: “The Development of a Model 
Reference Adaptive Controller to Control the Knee Joint of Paraplegics’’, IEEE 
Trans, on AC, Vol. 36, No. 6, pp. 683-691, June 1991. 

8. Oderkerk, B.J. and Inbar, G.F.: “Walking Cycle Recording and Analysis for FNS 
Assisted Paraplegic Walking’’, MBEC , Vol. 29, No. 1, pp. 79-83, 1991. 

9. Inbar, G.F. and Yafe, A.: “Parameter and Signal Adaptation in the Stretch Reflex 
Loop’’, Progress in Brain Res., Vol. 44, pp. 317-337, 1976. 

10. Inbar, G.F.: “Muscle Spindles in Muscle Control: Analysis of Adaptive System 
Model’’, Kybernetik, Vol. 11, pp. 130-141, 1972. 

11. Levin, E., Gewirtzman, R. and Inbar, G.F.: “Neural Network Architecture for Adap- 
tive System Modeling and Control’’, Neural Networks, Vol. 4, pp. 185-191, 1991. 

12. Hemami, H. and Farnsworth, R.L.: “Postural and Gait Stability of a Planar Five 
Link Biped by Simulation’’,. IEEE Trans. Auto . Cont ., pp. 452-458, June 1977. 

13. Hearn, A.C.: “REDUCE - User’s Manual Version 3.2’’, Rand Publication No. 
CP78, The Rand Co., Santa Monica, Calif., April 1985. 

14. Wolovich, W.A.: “Linear Multivariable Systems’’, Appl . Mathematical Sciences, 
Vol. 11, Springer Verlag 1974. 

15. Sinha, P.K.: “Multivariable Control, An Introduction’’, Electrical Eng. and Elec- 
tronics, Vol. 19, Marcel Dekker Inc., 1984. 

16. Minsky, M. and Papert, S.: “Perceptrons”, MTT Press, 1969. 

17. Levin, E.: “A Recurrent Neural Network: Limitations and Training’’, accepted for 
publication in Neural Networks Journal, 1988. 




238 



18. Hopfield, J.J.: “Human Memory Error Correcting Codes and Spin Glasses”, Proc. 
Natl., Acad. Sci. t U.S.A. 79, 1982. 

19. Lippman, R.P.: “An Introduction to Computing with Neural Nets”, IEEE ASSP 
Magazine , April 1987. 

20. Widrow B. and Steams, S.: “Adaptive Signal Processing”, Prentice-Hall, 1985. 

21. Gewirtzman, R. and Inbar, G.F.: “ANN Adaptive Controller using Threshold 
Decomposition Architecture* *. In preparation. 




How Fast Can a Legged Robot Run? 



Jeff Koechling 1 and Marc H. Raibert 2 

1 Sibley School of Mechanical and Aerospace Engineering 
Cornell University, Ithaca, NY 14853-7501, USA 

2 Artificial Intelligence Laboratory 

Massachusetts Institute of Technology, Cambridge, MA 02139, USA 

Abstract. Several parameters can limit the running speed of a legged sys- 
tem. Among them are the strength, length, and stiffness of the legs, the range 
of joint motion, and the actuator force- velocity characteristics. We have ex- 
plored how varying these parameters affects top running speed. We developed 
a dependency tree that suggests that a robot should have long, strong, stiff 
legs, and actuators with high peak velocity in order to run fast. We have 
also proposed three ways to improve the control of body attitude in high 
speed running: keeping hip motions symmetric, compensating for actuator 
characteristics, and accelerating the hip joint in anticipation of touchdown. 
In laboratory experiments a planar two-legged robot has reached a top speed 
of 5.9 m/s (13 mph). 

Keywords. Running, Locomotion, Speed. 

1 Introduction 

“How fast can it go?” Since time immemorial, people have staged races to 
compare the speed of people, animals, or vehicles. Speed excites people, 
often because it is the focus of a competition, and sometimes because of the 
danger or novelty of going fast. Speed is also an easily understood measure 
of performance. It summarizes the capabilities of a complex system with a 
single number that says something about an athlete’s prowess, an animal’s 
likelihood of survival, or a vehicle’s utility. 

Three things limit the speed of a vehicle: the power available to overcome 
drag, the ability of the structure to withstand loads, and the stability of the 
motion in the face of disturbances. As the system accelerates it encounters 
a limit of power, strength, or stability that establishes its maximum speed. 
If the limitation is power, then at maximum speed the drag force cancels the 
thrust force, leaving no thrust to accelerate the system. If the limitation is 
strength, then at maximum speed the loading on some component equals its 
strength, and any increase in speed would cause it to break. If the limitation 




240 



is stability, then at the maximum speed some equilibrating mechanisms is at 
its stability limit, and at any higher speed the system would tumble out of 
control. Most vehicles are designed so that their speed is limited by power, 
rather than by structure or stability, since a simple inability to accelerate is 
preferable to structural failure or loss of control. 

How fast a legged system can run depends on its design, and on how it 
is controlled. The important parts of the design are the legs and the hips. 
To run fast, the legs should be long, strong, springy, and stiff, and the hips 
should be able to rotate rapidly and through a large angle. The control 
system must coordinate the actions of the legs and hips so as to regulate 
the momentum of the system in the horizontal, vertical, and rotational di- 
rections. The principles of symmetry, modeling, and anticipation help the 
control system to regulate rotation of the body. 

Running speed is the product of step length and step frequency. The simple 
prescription for fast running is to take long steps, and to take them quickly. 
Step length depends on leg length and hip joint range of motion, while step- 
ping rate depends on leg stiffness, leg strength, and hip rotation rate. 

Leg length — Long legs allow long steps, so legs should be long for fast 
running. The distance that a legged system can move forward while its foot 
is on the ground is proportional to how long its legs are. Mass, moment of 
inertia, and strength all depend on leg length, and there is a limit to how big 
a leg can be and still be strong enough to support its own weight and the 
inertial forces required to move it. 

Hip joint rotation — The distance that a legged system can travel for- 
ward during a bounce depends not only on the leg length, but also on the 
range of motion of the hip joint. The hip joint limits how far the leg can 
pivot during stance without disturbing the attitude of the body. 

Hip rotation rate — Hip rotation rate can directly limit running speed. 
During stance, a legged system is like a polar manipulator. The foot remains 
fixed on the ground, and the leg length and hip angle determine the position 
of the body with respect to the foot. The hip position, plus the rate of 
extension of the leg and the rate of rotation of the hip joint determine the 
velocity of the body. The faster the hip joint can rotate, the faster the body 
can advance during stance. 

Leg stiffness — A running system alternately bounces off of the ground 
and flies through the air. For the system to bounce, the legs must be springy. 
During each bounce against the ground, ground contact forces reverse the 
vertical momentum of the system. As the legs compress during stance, they 
build up force, and the vertical component of that force reverses the vertical 
momentum of the system. The stiffer the springs are, the faster the forces 
build up and the more quickly the system bounces. 

Leg strength — The total impulse required to reverse the vertical mo- 
mentum of the body is the integral of the contact force over the duration of 




241 



the bounce. The legs must be strong enough to transmit the ground reaction 
force to the body without breaking or buckling. The shorter the bounce, the 
larger the contact forces must be. Thus, the stronger the legs are, the faster 
the system can bounce off of the ground without damaging the legs. 

Symmetric leg motions — By moving its two legs symmetrically, a biped 
minimizes how much its body attitude deviates from the nominal angle. If the 
hip joints are at the center of gravity of the body, then the only disturbances 
to the body attitude are caused by hip torques. Equal and opposite motions 
of the hip joints ensure that the hip torques cancel out, and thus do not 
disturb the body attitude. 

Actuator velocity compensation — During stance, the hip of the 
stance leg is pushed forward by the body, causing the hip to rotate at a 
rate proportional to running speed. Velocity dependent torques in the hip 
joints should be compensated so that the body is not rotated forward with 
the leg. 

Ground speed matching — When the leg touches down, an impulsive 
contact force brings the unsprung mass of the foot to rest. At high speeds, 
this impulse is not aligned with the axis of the stance leg, so it tends to 
rotate the leg. The impulse happens very quickly, faster than the hip joint 
can respond, so some torque is transmitted to the body, which also begins to 
rotate. If the control system anticipates the touchdown, and accelerates the 
hip joint before impact, then the impulse is aligned with the axis of the leg. 
In this case the hip joint does not transmit any torque to the body. 

Taken together, symmetric leg motions, actuator velocity compensation, 
and ground speed matching substantially reduce body attitude disturbances 
associated with running fast. 

In the following section we review relevant previous studies. Then we de- 
velop a dependency tree that expresses the speed of a running system in 
terms of its physical parameters. Finally, we present laboratory experiments 
suggested by the analysis. 

2 Background 

The study of running is interdisciplinary. Some areas of research that provide 
results helpful in understanding running speed are: 

• Studies of running animals 

• Creation of artificial legged systems (robots) 

• Investigation of the performance of vehicles in general 

Biologists have studied innumerable aspects of running animals, including 
their structure, the motions and forces that occur during running, and the 
energy consumed for different speeds and gaits. Robots and vehicles that 




242 



travel on legs provide a way to study walking and running in simple, easily 
instrumented systems, without the complexity inherent to biological systems. 
Research into the performance of boats, aircraft, and land vehicles is relevant 
to studying running speed, because the task of locomotion and the physical 
principles of support, balance, and progress are common to all vehicles. 

2.1 Biomechanical Research 

2.1.1 Scaling 

Biologists have proposed various similarity models to describe how the shape, 
structure, and motions of an animal depend on its size. Similarity models 
either describe measurements gathered from animals that are similar in ar- 
rangement but vary in size, or they describe the way that such measurements 
should vary in order to keep some quantity the same at all sizes. 

Similarity models apply to all mechanisms, not just biological ones. A 
mechanism design only works over a limited range of sizes. Typically, at a 
very small size the ratio of viscous forces to inertial forces increases and the 
resultant damping prevents the mechanism from operating. At a very large 
size, gravitational forces become dominant and exceed the strength of the 
materials. 

Hill (1950) concluded that speed is independent of size for animals of similar 
design. An animal makes movements that are proportional to its size, but at 
a frequency that is inversely proportional to its size. For geometrically similar 
animals, the differences cancel out, so top speed is the same regardless of size. 
Hill introduced the idea of physiological time, saying that animals live on a 
time scale proportional to body size. Thus large animals live longer than 
small animals, their hearts beat more slowly, and they take more time for 
each running step. Hill also noted that for large animals, a greater portion 
of skeletal and muscular strength is required to support the animal’s weight 
than for small animals. 

McMahon (1975) compared and discussed three scaling laws: geometric 
similarity, elastic similarity, and static stress similarity. Geometrically simi- 
larity preserves shape across scale, as all linear dimensions change with the 
same scale factor. Elastic similarity preserves resistance to column buckling. 
For columns of different size to have the same safety factor against buck- 
ling, long columns must be relatively thicker than short columns. The scale 
factor for the diameter of elastically similar columns is 3/2 the scale factor 
for length. Static stress similarity preserves resistance to bending failure in 
simply supported beams bearing their own weight. In this case, the scale 
factor for diameter is twice the scale factor for length. 




243 



Comparison of Three Similarity Models 





geometric 

similarity 


elastic 

similarity 


static stress 
similarity 


length, t 


l oc W l/3 


e oc w l/A 


l oc W 1/5 


diameter, d 


d oc W 1/3 


d oc W 3/a 


d oc W 2/5 


surface area, S 


S oc W 2/3 


S oc W 5/a 


S oc W 3/s 


cross sectional area, A 


s oc W 2/3 


A oc W 3/A 


A oc W 4/s 


natural frequency, u 


u> oc W~V 3 


u oc W- lla 


w oc W° 


speed, V 


Voc W° 


V oc W 1/a 


V oc W 2/s 



Table 1: Three similarity principles predict different variations in shape and 
speed as a function of body weight, W. The table is adapted from a paper by 
McMahon (1985), who cites examples of animal measurements that match the 
elastic similarity model. 

McMahon presented a variety of evidence that the design of animals is in 
accordance with elastic similarity. In particular, the running speed of animals 
is proportional to body weight raised to the 1/4 power, as predicted by elastic 
similarity. Geometric similarity predicts an exponent of zero, and static stress 
similarity predicts an exponent of 2/5. Table 1 includes a few predictions of 
the three similarity models. 

Alexander (1988) found that animals weighing over 20 kg scale according 
to elastic similarity, while the dimensions of smaller animals obey geomet- 
ric similarity. Alexander also describes an extension of geometric similarity 
called dynamic similarity. For geometric similarity, animals of different sizes 
undergo a uniform scaling of linear dimensions. For dynamic similarity, the 
scaling of linear dimensions is accompanied by uniform scaling of time and 
force. Thus, dynamic similarity specifies not only the change in shape of 
animals at different sizes, but also changes in their motions. Alexander pro- 
poses that the scale factor for forces be the cube of the scale factor for length, 
and that the scale factor for time be the square root of the scale factor for 
length. These scale factors work for motions characterized by gravitational 
and inertial forces. However, Alexander points out that if both elastic and 
gravitational forces are important it is impossible to maintain strict dynamic 
similarity. 

2.1.2 Energetics 

The power required for running increases with speed. The environment op- 
poses motion with drag forces, dissipating power equal to the product of the 
forward speed and the drag force. Drag forces remain constant or increase 
with running speed. For example, the gravitational drag caused by climbing 







244 



a hill is independent of speed, while aerodynamic drag increases as the square 
of running speed. 

Several investigators have measured the oxygen consumption of running 
animals. Consumed oxygen produces metabolic energy at a rate of 1 ml O 2 = 
20.1 J. The rate of oxygen consumption indicates the metabolic power pro- 
duced by the animal, some of which is used to overcome the resistance of the 
environment, some of which is dissipated in muscle inefficiency, and some of 
which is used to maintain the animal’s basal metabolism. 

The rate of metabolic energy consumption increases with running speed. 
Taylor, Heglund, and Maloiy (1982) report that the energy consumed during 
running is: 

^metab/Mfc = 10.7M^ 31 \ + 6.03 Mf 0 ' 303 (1) 

where f? me tab is the metabolic energy consumed in watts, Mb is the body 
mass in kg, and v g is the running speed in m/s. The equation is based 
on measurements from 60 species of animals, ranging in size from 0.0072 kg 
pygmy mice to 254 kg zebu cattle. It indicates that energy consumption 
increases linearly with running speed, and that the rate of increase is smaller 
for large animals than for small animals. 

Dawson and Taylor (1973) studied the energetic cost of locomotion in kan- 
garoos. They found that over a range of speeds from 2 m/s to 6 m/s, hopping 
frequency and energy consumption remain nearly constant, and the stride 
length increases in proportion to the speed. This is in contrast to the linear 
increase in energy consumption with speed indicated by equation 1. Dawson 
and Taylor attributed the constant consumption of energy with increasing 
speed to increased storage and recovery of elastic energy, particularly in the 
kangaroo’s large Achilles tendon. 

Alexander and Vernon (1975) measured the ground forces exerted by hop- 
ping kangaroos using force plates, and combined the measurements with film 
records of the motion to determine the fluctuations of energy during hopping. 
They calculated that elastic storage of energy in the kangaroo’s Achilles ten- 
don reduced the energetic cost of hopping by 40%. Alexander and Vernon 
also noted that the kangaroo’s tail rotates in the opposite direction from the 
legs, in a way that reduces the angular motions of the body. Large sheets of 
elastic tendon along the tail contribute to its oscillation. 

McMahon and Greene (1978, 1979) considered not only the compliance 
of a runner’s legs, but the compliance of the ground as well. Their model 
predicted that top running speed would be slightly greater on a compliant 
track than on a rigid surface. The model predicted that the fastest running 
would be on a track four times as stiff as the runner’s legs. McMahon and 
Greene built such a track at Harvard University, and observed the predicted 
2% increase in running speed, along with a decrease in injuries. This study 
points out the importance of the environment in determining the top speed 
of a running system. 




245 



Legged Robots and Vehicles 



Machine 


Leg Length 
m 


Speed 

m/s mph 


6 legs 


OSU Hexapod (McGhee) 


0.8 


0.3 


0.7 


USSR Hexapod (Gurfinkle) 


0.35 


0.1 


0.2 


SSA Hexapod (Sutherland) 


1.0 


0.14 


0.31 


ODEX (Odetics) 


1.3 


0.5 


1.1 


ASV (Waldron) 


1.9 


2.2 


5.0 


4 legs 


PV II (Hirose) 


0.87 


0.5 


1.1 


Quadruped (Raibert) 


0.66 


3.0 


6.7 


2 legs 


WL10-RD (Kato) 


0.96 


0.23 


0.51 


Biper-3 (Miura) 


0.20 


0.02 


0.04 


MEG-2 (Funabashi) 


0.48 


0.5 


1.1 


Kenkyaku (Furusho) 


0.72 


0.8 


1.8 


Planar Biped (Koechling) 


0.8 


5.9 


13.1 



Table 2: These are a few of the machines and vehicles that have been used to 
study the control of legged locomotion. None of the machines was explicitly 
designed for high speed. All of the leg lengths and speeds are approximate. 

Hoyt and Taylor (1985) studied the energetics of horses walking, trotting 
and galloping. For each gait they found one speed that provided the best 
energy efficiency, and if allowed to move freely, a horse always chose a speed 
and gait that corresponded to a local maximum in efficiency. 

2.2 Robotic Research 

There are very few artificial systems that walk or run. Walking and running 
require dynamic stability, meaning that the system moves continuously in 
order to keep the average point of support beneath the center of gravity. 
Table 2 lists the number of legs, leg length, and speed of several legged 
robots. Keep in mind that none of these systems was designed for speed. 
Each was designed to study some aspect of the control of locomotion. 

Huang and Waldron (1987) derived the relationship between weight and 
maximum speed for a hexapod vehicle crawling with a particular gait. By 
assuming that the distribution of forces among the support legs was a linear 
function of the forward and lateral position of the legs, and requiring the 
vehicle to remain in static equilibrium, they determined the proportion of 
the vehicle’s weight on each leg as a function of speed. By limiting the force 




246 



on the most heavily loaded leg to the maximum safe load they computed the 
tradeoff between speed and payload. 

2.3 Vehicular Research 

Gabrielli and von K£rm£n (1950) studied the cost of locomotion at differ- 
ent speeds. They gathered data on the gross weight, installed power, and 
maximum speed of many land vehicles, ships, boats, aircraft and animals. 
For each vehicle they computed the specific resistance , which is the ratio of 
power to the product of weight and velocity: e = P/WV. This nondimen- 
sional quantity is a measure of the energetic cost of locomotion. Gabrielli and 
von K£rm£n’s data indicate that any particular means of locomotion is only 
energy efficient over a narrow range of speeds, and that in general small, fast 
vehicles are less efficient than large, slow vehicles. For example, a merchant 
ship has a specific resistance of about 0.003 at a speed of 6 m/s, while a jet 
fighter plane has a specific resistance of 0.3 at a speed of 300 m/s. 

The graphs of specific resistance as a function of speed show a limiting line, 
a minimum specific resistance that increases with speed. This line represents 
an efficiency limit imposed by aerodynamic or hydrodynamic drag. Single 
vehicles of sufficient size should have specific resistances below the limit, 
because specific resistance decreases with vehicle size. Railroad trains achieve 
a greater energetic efficiency because the aerodynamic drag is smaller for the 
train than for the individual cars, and because the power is supplied by a few 
large, efficient engines. Gabrielli and von Karman point out that the size of 
the vehicles that can be built is limited by the strength to weight ratio of the 
available construction materials. 



3 The Dependency Tree 

One way to impose structure on the relationship between the physical pa- 
rameters of a mechanism and how fast it can run is to build a dependency 
tree, as shown in figure 1. The top of the tree is running speed. The branches 
are formed by expressing running speed as the ratio of step length to step 
period, and successively refining those quantities to simpler characteristics of 
the running motion. The leaf nodes are parameters of the links, joints, and 
actuators. Body attitude is an exception; it is determined by how well the 
control system corrects disturbances. 

The equations in figure 1 embody several assumptions, which are explicitly 
stated in the paragraphs below. The resulting analysis accurately represents 
the kinematics of the mechanism, but it uses simplified dynamics, and says 
nothing about energetics. It leads to tractable expressions for running speed 
that qualitatively predict how parameter variations affect speed. 




( 1 ) 



247 



Speed 
V = S/T 



( 2 ) 



Step Length 



Step Period 




Figure 1: A tree indicating some of the ways the running speed of a legged sys- 
tem depends upon parameters of the mechanism. The top of the tree represents 
speed, a measure of performance. The intermediate rows represent character- 
istics of the running motion, and the bottom row represents the parameters 
of the mechanism. The state variables of the system are the leg length, the 
rate of leg extension, the leg angle, and the rate of leg rotation. The oper- 
ating point is described by the state at the moment of liftoff: £, 0, and 0. 

The horizontal and vertical components of liftoff velocity are x and z. If the 
steps are symmetric and the pitch angle of the body is uniformly zero, then 
the operating point completely characterizes the motion. Steps are symmetric 
if the horizontal velocity and the leg length are the same at touchdown as at 
liftoff, and the vertical velocity and the leg angle change signs from touchdown 
to liftoff. The coordinate system is shown by figure 2. 




248 





Figure 2: (a) The horizontal and vertical distances from the foot to the hip are 
determined by the leg length and leg angle: x = £sin(0), z = £cos(0). During 
stance the foot is motionless, so the derivatives of the hip coordinates give 
the horizontal and vertical velocity with respect to the ground: x = ^sin(0) + 

£6 cos(0), z — £cos (6) — £6 sin(0). (b) The leg angle is the sum of the leg angle 
with respect to the body, and angle of the body with respect to the ground: 

0 = (f) -f p. The three angles, 0 , <f> } and p are measured clockwise from the 
nominal position, in which the leg is vertical and the body horizontal. 

3.1 Speed 

The first row of the dependency tree in figure 1 is the definition of running 
speed (V'), which is the ratio of the forward progress on each step (5) to the 
time required to complete that step (T): 

V = S/T. (2) 

To increase its speed, a running system must take longer steps, more fre- 
quent steps, or both. In bipedal running, stance and flight proceed in strict 
alternation, and a step consists of exactly one stance and one flight. Figure 3 
shows that the distance traveled during one step is the sum of the distance 
traveled during stance ( S s ) and the distance traveled during flight (Sf)\ 

S = S s + S f . (3) 

Likewise, the time required for a step is the sum of the stance duration (T 5 ) 
and the flight duration (T/): 



T = T a + T f . (4) 

These definitions make up the second row of the tree. 

In the third row of the tree, the dynamics of the system come into play. 
In the stance phase, the system resembles both a mass bouncing on a spring 
and an inverted pendulum pivoting over its fulcrum, as shown in figure 4. 
The dynamics are much simpler during flight, when the system approximates 
a rigid ballistic projectile rising and falling under the influence of gravity. 




249 




Figure 3: Step length (5) is the sum of the forward progress of the hip during 
stance ( S a ) and the forward progress of the hip during flight (5/). 




Stance 



Figure 4: The motion of a running system during stance results from the 
interaction of two simpler motions. The vertical motion is predominantly the 
bouncing motion of a spring-mass oscillator. The forward travel results from 
the tipping motion of an inverted pendulum that moves first toward and then 
away from the unstable equilibrium point. Forward speed decreases during the 
first half of stance, because some of the horizontal kinetic energy is temporarily 
stored in the leg spring. During the second half of stance, the spring releases 
energy and the system speeds back up. 




250 




Figure 5: The distance traveled during stance is a function of the leg length 
and hip angle at the beginning and end of stance. If the motion is symmetric, 
so that the state at liftoff is a mirror image of the state at touchdown, the 
distance traveled is 2^sin(0). The system in the figure is moving from left to 
right, touching down with the foot in front of the hip and lifting off with the 
foot behind the hip. 

3,2 Stance 



The leftmost and rightmost nodes of row three describe the stance duration 
and the stance travel. The stance travel is the forward progress of the body 
during stance, and is a function of the leg lengths and leg angles at the 
beginning and end of the stance phase, as shown in figure 5. For symmetric 
steps, the leg length and leg rotation rate are the same at touchdown and at 
liftoff, while the leg extension rate and the leg angle have opposite signs at 
touchdown and at liftoff. Under the assumption of symmetry, the distance 
traveled during stance is: 

S 5 = 2^sin(0), (5) 

where 0 and l represent the leg angle and leg length at liftoff. 

The motion during stance is described by a pair of coupled second-order 
non-linear differential equations. The stance duration can be computed by 
integrating these equations forward in time from the moment of touchdown 
until the moment of liftoff. We know of no closed form expression for the 
stance duration as a function of the mechanism parameters and the state of 
the system at touchdown. In order to proceed with the analysis, we pretend 
that the horizontal and vertical motion of the system are decoupled, and that 
the stance duration is determined only by the vertical motion. 

For the simple case of a mass bouncing vertically on a linear spring, McMa- 
hon (1986) showed that the time required to rebound from the ground de- 
pends on a parameter he called the Groucho number. Modifying McMahon’s 




251 




Figure 6: The distance traveled during flight is determined by the liftoff ve- 
locity (±,£). For symmetric steps, in which the height of the body above the 
ground is the same at touchdown as at liftoff, the duration of flight is 2 i\ 0 jg. 
The forward progress during flight is 2 zx/g. The system in the figure is moving 
from left to right, lifting off from the foot behind the hip and landing on the 
foot in front of the hip. 

formula to take the leg spring mechanical stops into account gives an expres- 
sion for the stance duration: 

(2/u;o) [tt - arctan (N f G )] , 

7T 

uV 

(2/l>o) arctan ( N G ) , 

where 

N f c is a modified Groucho Number (N G = 

u>o is the natural frequency of the spring-mass system (>/fc/m), 
z is the vertical velocity at liftoff, 
g is the acceleration of gravity 

k is the leg stiffness, 

m is the body mass, 

P is the preload force. 

The entry for stance duration in figure 1 is the third case of equation 6, where 
the preload force exceeds the weight of the system. 

Equation 6 gives the stance duration of a system bouncing in place on a 
linear spring. Increasing the leg stiffness, the preload force, or the vertical 
velocity shortens the stance duration, whereas increasing the mass or the 
acceleration of gravity lengthens the stance duration. The behavior is quali- 
tatively similar for the more complex case of a nonlinear spring and the leg 
pivoting about the foot. In the nonlinear case, equation 6 can be used to pre- 
dict stance duration by assuming, computing, or measuring a value for the 
natural frequency. The natural frequency depends on the effective vertical 
stiffness, which depends on the sweep angle and impact velocity as well as 
on the stiffness of the leg. 




if N r c > 0 (P < mg)] 

if Nq = oo (P = mg)\ (6) 

if N g <0 (P > mg). 




252 



3.3 Flight 

The two nodes in the middle of row three of figure 1 describe the flight travel 
and the flight duration. The formulas give the duration of flight and forward 
progress of a rigid body that has an initial velocity (x, z) and is accelerated 
only by gravity ( g ). The formulas thus ignore changes in the location of the 
center of gravity due to the motions of the legs, and accelerations due to 
aerodynamic drag. The rigid body assumption leads to simple expressions 
for the flight duration and the flight travel: 



Tf = 2z/g (7) 

S f = 2 zx/g. (8) 



The third row of figure 1 divides running speed into four quantities: stance 
travel, stance duration, flight travel, and flight duration. These four quanti- 
ties depend on the quantities in row four, which are functions of the operating 
point, and on the parameters in row five, which characterize the mechanism. 
The behavior in stance depends on the mass of the system, the stiffness and 
preload of the leg spring, the length and angle of the leg at liftoff, and the 
regulation of body attitude. The behavior in flight depends on the joint posi- 
tions and velocities at liftoff. The following sections discuss the dependencies 
on operating point and mechanism parameters in more detail. 

3-4 Stance Travel 

The system travels a distance S s = 2£sin(0) during stance. The motion 
during stance depends on the leg length and leg angle at touchdown, which 
are chosen by the control system to make the motion during stance symmetric 
and thus maintain a constant forward speed. The higher the speed, the 
farther forward the foot must be ahead of the hip at touchdown. The design 
of the leg limits the leg length, and the design of the hip joint limits the angle 
of the leg with respect to the body. 

The leg angle is limited by the angle of the leg with respect to the body, 
and by the angle of the body with respect to the ground. We call the angle of 
the body with respect to the ground the pitch of the body. With the body in 
its nominal orientation, the range of leg angles permitted by the hip joint is 
symmetric about vertical, allowing the leg to swing equally far forward and 
backward. Deviations of the body from its nominal orientation reduce either 
the distance that the leg can reach forward for touchdown or the distance that 
it can reach back before liftoff. Either case reduces top speed by reducing 
the travel that can be achieved during stance. 




253 



If the body rotates away from its nominal pitch angle, the travel of the 
foot with respect to the hip is asymmetric. If the body is pitched forward, 
the distance that the foot can reach ahead at touchdown is reduced. During 
flight, the control system positions the foot in front of the hip by a distance 
that is proportional to running speed. Reducing the foot travel ahead of the 
hip reduces the maximum stable running speed, regardless of how much foot 
travel is gained behind the hip. If the body rotates backward, the foot can 
reach farther ahead of the hip for touchdown. However, the hip joint then 
reaches its limit of rearward travel before the end of stance, abruptly pitching 
the body forward. 

3.5 Stance Duration 

The duration of stance is T a = (2/u>o) arctan • The expression 

is based on the assumption that the stance duration is determined by the ver- 
tical motion of the system independent of the horizontal motion. Other plau- 
sible expressions for the stance duration might be derived from the strength 
of the leg, from the decrease in forward speed during stance, and from the 
velocity and acceleration limitations of the hip joints. For figure 1, we chose 
the representation based on stiffness, because leg stiffness determines stance 
duration, while leg strength and hip joint properties might limit stance dura- 
tion. The stance duration must be long enough that the forces do not break 
the leg, and that the hip joint has time to move from the touchdown angle. 
The stance duration must be long enough, or the speed slow enough, that 
the hip joint does not exceed its maximum angle of rotation. 

3.6 Flight Travel and Flight Duration 

During flight the center of gravity of the system moves along a parabolic 
trajectory determined by the velocity at liftoff and the acceleration of gravity. 
The liftoff velocity depends on the leg length, le g extension rate, leg angle and 
leg rotation rate. It has magnitude \J Pi) 2 + t 2 and direction arctan (t/£0)—0. 
The magnitude is independent of the leg angle. The horizontal and vertical 
components are: 



x = ^sin(0) + tO cos (0) 
z = £cos(0) - tdsm(0). 



(9) 

( 10 ) 




254 



3.7 Speed Equations 



Combining the formulas in figure 1 yields a single equation that expresses 
running speed as a function of operating point and mechanism parameters. 
The top row of the tree defines running speed as V = S/T. Breaking up the 
step length and step period expands the definition to: 



_ Sf + S s 

Tf + Tg’ 



( 11 ) 



Incorporating the definitions of stance travel, flight travel, flight duration, 
and stance duration gives: 



V = 



xz + g£ sin 9 

* + (s/“> o) arctan ( p/^-l ) 



(12) 



Finally, replacing the Cartesian components of liftoff velocity according to 
equations 9 and 10, gives an equation for running speed in terms of the state 
variables (£, £ } 0, 0 ) at liftoff and the parameters (u;o, P, m, g ): 



( £,£ t 9,9,u> 0 ,P,m,g ) 



(13) 



(£ 2 — (?6 2 ) sin(20) + 2££9 cos(20) + 2^sin(0) 



2^cos (0) - 2£6 sin(0) + (2g/u>o) arctan 



P/mg- 1 



£ cos(6) - £9 sin(0) 



The extent of the state space is determined by mechanism parameters that 
describe maximum excursions and velocities of the joints. The construction 
of the leg establishes minimum and maximum leg lengths, and the rate of 
change of leg length is less than some maximum. If the pitch angle is always 
zero, then the leg angle and leg rotation rate are limited by the maximum 
excursion and velocity of the hip joint. At liftoff the leg angle, leg rotation 
rate, and leg extension rate are all positive: 



^min ^ ^max 
o <£ < imax 

o <9 < tp max < 7t/2 
0 <9 < ^max- 



(14) 



The operating point, which is the state at liftoff, lies in this restricted region 
of the state space. 



255 



3.8 Summary of the Dependency Tree 

The top of the dependency tree shown in figure 1 is a performance measure, 
running speed. The intermediate rows are characteristics of the running 
motion: 

• step rate 

• step length 

• flight duration 

• stance duration 

• stance travel 

• flight travel 

• leg sweep angle 

• horizontal and vertical velocity at liftoff 

• natural frequency 

At the bottom of the tree are the parameters of the physical mechanism 
that determine running speed: 

• body attitude 

• leg length 

• hip position 

• leg extension rate 

• hip rotation rate 

• leg spring preload force 

• leg stiffness 

• mass 

The formulas in the dependency tree are based on the following assump- 
tions: 

• There is no air drag, so the speed during flight is constant. 

• The center of gravity is fixed with respect to the body, so the mechanism 
moves like a rigid body during flight. 

• The motion during stance is symmetric: 

l td — l to ®td @lo 3'td. = io %td ~ z lo 

ltd = “ ko did —@lo *td =&lo Ztd = - ko 

This structure provides a framework for studying how running speed de- 
pends on the operating point and on the physical mechanism parameters. 




256 



The dependency tree provides a model of running speed. Although it in- 
corporates several simplifying assumptions, it provides more intuition about 
how running speed depends on mechanism parameters than more complex 
models would. In particular, dynamic simulations would predict the results 
of experiments, but they have too many parameters to offer much insight to 
the problem. The dependency tree models running with a small number of 
parameters, and shows qualitatively how each affects speed. 

4 Experiments 

We have experimented with the planar biped to study how the design and 
control of a legged system affect its top running speed. The planar biped runs 
faster with long legs than with short legs, and faster with stiff leg springs 
than with soft leg springs. Experimenting with the biped has made it clear 
that there are speed dependent disturbances to body attitude, and that fast 
running requires that the control system reject or correct those disturbances. 
The biped’s power dissipation increases with running speed, but the increase 
is small compared with the power required just to run in place. 

4.1 How Fast Running Differs from Slow Running 

Figure 7 shows the motions of the legs and body as the biped ran forward at 
a constant speeds of 1, 3, and 5 m/s, respectively. Table 3 lists the properties 
of a typical step at each speed. Compared to running slowly, running fast was 
characterized by longer and more frequent steps, higher frequency oscillations 
of leg length, leg angle, and body attitude, smaller and more frequent vertical 
oscillations of the body, and larger angular motions of the legs. 

5 Leg Length 

The longer a legged system’s legs are, the faster it can run. Figure 8 shows 
the results of nine experiments with the planar biped, each with a different 
leg length. During flight, the control system servoed the active leg to the the 
indicated length. During stance, the leg actuator extended, so the leg was 
longer at liftoff than it was at touchdown. During each experimental run, 
I used a joystick to increase the desired running speed, attempting to find 
the highest speed at which the biped would run without losing its balance. 
The reported speed for each run is the highest average speed for one lap of 
the 16 m running track. 

The control system could select leg lengths between 0.50 m and 0.65 m 
by adjusting the leg actuator. For longer leg lengths, the biped’s legs were 




spring length (m) j e g angle (degrees) pitch (degrees) hip height (m) 



257 




0 

1.2 

0.6 

0.0 

10 

0 



-10 

35 




time (s) time (s) time (s) 

Figure 7: The planar biped ran forward at 1, 3, and 5 m/s. The top three 
graphs show the forward speed, hip height, and pitch angle of the body. The 
fourth graph shows the equal and opposite motions of the two legs as they sweep 
back and forth. The bottom graph shows how the two leg springs compressed as 
the biped bounced alternately on the two feet. As running speed increased, the 
frequency of stepping and leg angle excursion increased, and the compression 
of the legs during the bounce decreased. 



258 



Step Parameters at 1, 3, 5 m/s 



commanded speed (m/s) 


1.0 


3.0 


5.0 


observed speed (m/s) 


0.90 


2.92 


4.99 


stance duration (s) 


0.128 


0.128 


0.088 


flight duration (s) 


0.588 


0.392 


0.320 


step period (s) 


0.716 


0.520 


0.320 


stance travel (m) 


0.10 


0.34 


0.43 


flight travel (m) 


0.55 


1.18 


1.17 


step length (m) 


0.65 


1.52 


1.60 


leg length (m) 


touchdown 


0.595 


0.595 


0.591 


liftoff 


0.666 


0.658 


0.620 


leg angle w.r.t vertical (°) 


touchdown 


-2.2 


-11.1 


-19.5 


liftoff 


7.4 


21.9 


25.2 


body angle w.r.t horizontal (°) 


touchdown 


1.3 


0.0 


-4.1 


liftoff 


1.3 


-0.2 


-2.5 


vertical velocity (m/s) 


touchdown 


-2.95 


-2.38 


-1.60 


liftoff 


3.00 


2.12 


1.54 


horizontal velocity (m/s) 


touchdown 


0.75 


2.58 


5.02 


liftoff 


0.86 


2.97 


4.97 



Table 3s Each column of data is for a typical step on leg two of the planar biped 
while it was running at constant speed. Increasing speed was accompanied by 
longer steps and shorter step periods, as shown by these data. During these 
experiments, the air pressure in the leg springs was 90 psi and the thrust 
algorithm extended the leg actuator as quickly as possible during stance. 

extended with stilts and joined to the bottom of the legs. The feet were moved 
to the bottom of the stilts. A 0.191 m stilt gave a leg length of 0.844 m, and a 
0.391 m stilt gave a leg length of 1.005 m. Figure 8 shows that the top running 
speed of the biped increased with increasing leg length, but may flatten out 
above 0.8 m. 

The increasing leg length of the biped was not accompanied by other 
changes specified by any principle of geometric or elastic similarity. No di- 
mension other than leg length changed. Since the diameter of the legs did not 
change, the strength of the legs remained the same, and the factor of safety 
against structural failure got smaller as the legs got longer. Figure 9 shows 
one of the consequences. After several running experiments at the longest leg 
length, one of the stilts broke where it was attached to the leg. The bending 
force on the leg had torn the stilt where it was fastened to the leg. 




259 




Figure 8: The longer the legs, the faster the planar biped ran. The planar 
biped ran nine times, each time with a different leg length. During flight, the 
control system adjusted the length of the leg. In each run, the experimenter 
raised the forward running speed to the highest value that could be maintained 
without the biped losing its balance. The listed speed is the average for a 
complete circuit around the 16 m circular track. Initially the range of possible 
leg lengths was 0.50 m to 0.65 m. Longer leg lengths were obtained by adding 
stilts to the end of the biped’s legs. A 0.191 m stilt gave a leg length of 0.844 m, 
and a 0.391m stilt gave a leg length of 1.005 m. During the experiments with 
the stilts, the leg actuator extended as fast as possible during stance, rather 
than trying to extend 0.021 m as in the other experiments. With a leg length 
of 0.844 m, the biped ran 5.9 m/s (13.1 mph), the highest speed ever recorded. 

The leg springs did not get any longer when the legs got longer, and that 
may have caused a problem. If the body were to move horizontally during 
stance, then the leg length when the leg was vertical would be ^cos(0), where 
£ and 0 are the leg length and leg angle at touchdown. So when the leg was 
vertical, the leg spring would be deflected at least ^[1 — cos(0)] from its length 
at touchdown. The actual deflection of the spring would be greater, since the 
path of the body is concave upward, so that the hip is always lower in the 
middle of stance than at touchdown or liftoff. The air springs on the biped 
have a maximum deflection of less than 0.10 m. For a leg angle of 25°, a leg 
length of 0.66 m would require the spring to deflect at least 0.06 m, but for a 
leg length of 0.98 m, the required deflection would be more than 0.09 m. The 
limit on leg spring deflection probably prevented the biped from using its full 
hip travel when it was running with the long stilts. 

When the long stilts were on the machine, the hip servo position gain 
had to be reduced to keep the servo from oscillating. Lengthening the legs 
increased their moment of inertia and their flexibility, both of which lowered 
the natural frequency of the first mode of vibration of the leg. Lowering the 
position gain softened the servo so that it did not excite the vibration of the 
leg. 




260 




Figure 9: One of the long stilts broke after several running experiments. The 
wall of the tubing tore where it was screwed onto the plug that joined the stilt 
to the leg tube. Long legs are more vulnerable to buckling failure than short 
legs. 

The planar biped runs faster with long legs than with short legs. The bro- 
ken stilt and the need to soften the hip servo point out some of the problems 
that accompany increasing leg length. 

6 Leg Stiffness 

The stiffer the leg springs are, the faster a legged system can run. Figure 10 
shows the results of eight experimental runs with the planar biped, each at 
a different leg stiffness. Before each experimental run, I set the indicated 
pressure with the regulator that supplies air to the pneumatic leg springs. 




261 



During the runs, I used a joystick to gradually increase the desired running 
speed to find the highest speed at which the biped would run without losing 
its balance. The reported speed for each run is the highest average speed 
for one lap of the 16 m circumference running track. The reported stance 
duration is the average of the stance durations observed during the fastest 
lap. 



6 r 



^ 0.15 r x 



X x 






8 0.10 



0.05 



50 100 

pressure (psi) 
(a) 



-i 0.00 1 



(b) 



50 100 

pressure (psi) 



Figure 10: These plots show the variation of top running speed (a) and of 
stance duration (b) as a function of the air pressure in the leg springs. Each 
point represents the average stance duration or forward speed over the fastest 
lap at the given air pressure. To generate the data the planar biped ran eight 
times, each with a different air pressure in the leg springs. In each run, the 
experimenter raised the forward running speed to the highest value that could 
be maintained without the biped losing its balance. The reported forward 
speed, stance duration, and vertical landing velocity are average values for a 
complete circuit around the 16 m circular running track. During these runs, 
the leg length at touchdown was 0.623 m, and the thrust algorithm extended 
the leg actuator as rapidly as possible during stance. 



The leg stiffness of the planar biped depends on how much air is in the 
leg springs. An air line leads from the spring chamber to a regulator that 
maintains the desired pressure in the line. A check valve isolates the spring 
chamber when pressure inside is higher than the pressure in the line. If air 
leaks out of the spring while it is compressed, then when the spring extends 
the pressure inside drops below the pressure in the air line. In this case the 
check valve opens, restoring the spring pressure to the desired value. The 
higher the air pressure, the more air there is inside of the leg spring. Higher 
air pressure increases both the stiffness and the preload force of the spring. 

Figure 10 shows that the biped ran faster and took steps with shorter stance 
duration when it was running with high leg spring air pressure than when 




262 




15 

time (seconds) 



Figure 11s A control algorithm that kept the leg angles equal and opposite 
reduced the the amplitude of the oscillations in body attitude. The top graph 
shows the angle of each leg with respect to the axis of symmetry of the body. 
The sign of the angle of leg 2 is reversed so that when the leg angles are 
symmetric the lines are on top of one another. The vertical line marks a switch 
from an algorithm that moved the legs independently to one that ensured that 
the leg angles were mirror images. The axis of symmetry was a line passing 
through the hip joint perpendicular to the body. The bottom graph shows that 
changing the leg positioning algorithm reduced the oscillations in body angle 
from about 20° peak-to-peak to about 6° peak-to-peak. In this experiment the 
planar biped was running about 2.5 m/s (5.6 mph). 

it was running with low leg spring air pressure. The stiffness of the legs, 
and thus the natural frequency of the bouncing motion increased with the 
pressure. 

7 Body Attitude 

Body attitude is important to top running speed because of the limited range 
of motion of the hip joints. If the body tips forward during flight, the hip joint 
limit prevents the foot from reaching as far forward for landing as it can with 
a level body. The distance that the foot needs to reach forward for landing 
is proportional to running speed, so forward tipping of the body reduces the 
top running speed. If the body tips backward during stance, the hip joint 
limit prevents the foot from reaching as far backward as it can with a level 
body. In this case, the hip may reach the joint limit before the leg leaves the 
ground, causing a sudden forward pitching of the body. Top running speed 
requires good control of body attitude so the hip joint can sweep through its 
full range of motion during stance. If there were no kinematic limits to hip 
angle, then body attitude would not affect running speed. 



263 



7.1 Mirroring 

Hip torques that position the legs also rotate the body. The control system 
can minimize disturbances to the body attitude by ensuring that the two legs 
move at the same time and in opposite directions. During stance, while the 
support leg sweeps backward, the other leg swings forward. During flight, 
while one is positioned for landing, the other leg makes compensating mo- 
tions to reduce the torques on the body. This mirroring action substantially 
reduces the variation in body attitude that occurs when each leg is moved 
independently. 

Figure 11 shows the result of an experiment comparing two different algo- 
rithms for moving the idle leg. At the time marked by the vertical dotted 
line, the control system switched algorithms. Before that time, the legs were 
positioned independently, and afterwards they were positioned according to 
the mirroring algorithm. The mirroring algorithm reduced the oscillations in 
pitch angle, from about 20° peak-to-peak to about 6° peak-to-peak. 

When the two legs were being positioned independently, the algorithm was 
as follows: during stance, the stance leg was swept back by the forward 
motion of the body, and by hip torques selected by the body attitude control 
servo. A leg angle servo moved the swing leg forward into position for the 
next touchdown. During flight, leg angle servos positioned the leg that would 
touch down next for landing, and servoed the leg that had just lifted off to the 
angle it had at liftoff. The leg angle servos were as stiff as possible, in order to 
minimize steady state error. The swing leg moved forward very quickly and 
stopped at the desired position well before the end of stance. After liftoff, 
the leg that had just left the ground was still rotating, so the servo applied 
torques to stop the rotation and move the leg back to the position it had at 
liftoff. These torques disturbed the body attitude. 

The mirroring algorithm servoed the idle leg so that its hip angle was equal 
and opposite to that of the active leg. During stance, the support leg was 
active, and swept back as the body moved forward. The swing leg was idle, 
and moved forward at the same rate as the support leg moved back. During 
flight, the leg that would next touch down was active, and was positioned for 
landing as usual. The leg that had just lifted off was idle, and made motions 
symmetric to the active leg. Compared with the independent positioning 
algorithm, mirroring caused the swing leg to advance more slowly, which 
reduced the disturbance to body attitude. During flight, mirroring caused the 
reaction torques generated by positioning one leg to be canceled by torques 
from moving the other leg. 




264 



7.2 Sweep Compensation 

The hip actuators on the planar biped have internal damping that causes a 
velocity dependent discrepancy between the commanded force and the deliv- 
ered force. When the biped runs fast, it leans forward until the body attitude 
servo commands enough force to overcome the actuator damping. Explicitly 
compensating for the velocity dependent forces reduces the tendency to lean 
forward. 

Figure 12 shows the body angle during an experimental run in which the 
control system gradually increased the running speed from zero to 4 m/s. 
The graphs show that the average body angle increased as running speed 
increased. 

When the biped runs in place, the hips barely move. When it runs forward, 
each hip rotates one way as the leg sweeps back during stance and the other 
way as the leg swings forward in preparation for the next step. These hip 
joint velocities are proportional to running speed. Each biped hip actuator 
behaves like a torque source in parallel with a damper. For a given input 
signal, the actuators produce less force when they are moving quickly than 
when they are moving slowly. The damping forces are proportional to hip 
rotation rate, which is proportional to running speed. 

The hip torque to overcome the actuator damping comes from the body 
attitude servo. During stance, the servo applies torques proportional to the 
angle and angular velocity of the body. The angle stabilizes when the body 
has leaned forward enough that the attitude servo generates a correcting 
torque equal to the torque caused by the actuator damping force. Because 
the force is proportional to running speed, the body leans farther forward as 
the biped runs faster. 

Compensating for the velocity dependent torques reduces the lean of the 
body. Figure 13 shows the body angle during an experimental run in which 
the body attitude servo added negative damping to the hip actuators by 
feeding back a signal proportional to the actuator velocity. The signal to the 
servovalve was: 

r = k p p + kpp + kyjW , (15) 

where p and p are the pitch angle and the pitch rate, w is the hip actuator 
velocity, and k p , and kp are the position and velocity gains that control pitch, 
and k^ is the inverse damping coefficient. The modification reduced the body 
angle offset from about 7° to about 3°. The same result might have been 
obtained by adding an integral term to the body attitude servo. 




265 



& 

i 

§ 

£ 



15 

10 

5 



0 

-5 








1 



J L 

2 3 



J I 

4 5 

speed (m/s) 



Figure 12: The planar biped leans forward when it runs fast. The graph shows 
body angle plotted as a function of running speed. The body angle oscillated 
during each step. As the running speed increased, the peak-to-peak amplitude 
increased with a slope of about 2.0° s/m. Similarly, the average body angle 
increased with a slope of about 2.2°s/m. At 4 m/s the offset was 7° and the 
peak-to-peak amplitude was 5 ° . 



8 15 r 

i 

a 

•a io| 



’d' 

2 




-51 




speed (m/s) 



Figure 13: Compensating for velocity dependent forces in the hip actuators 
reduced the average body angle, but did not reduce the amplitude of oscillation. 
As in figure 12, the body angle took on an offset and an oscillation as the 
running speed increased. Sweep compensation reduced the average body angle 
but not the amplitude of the oscillations. The increase in offset was about 
1.5°s/m. At 4m/s the offset was about 3° and the peak-to-peak amplitude 
was about 6°. 



266 




speed (m/s) 



Figure 14: The planar biped dissipates slightly more energy to run fast than 
it does to run slowly. The top two graphs show the instantaneous power and 
the running speed as the biped accelerated from rest to 4 m/s. The bottom 
graph shows power plotted as a function of running speed. The crosses show 
the peak instantaneous power on each step, and the circles show the average 
power for each step. The peak power was a nearly constant 6.6 kW, increasing 
very slightly at speeds of about 4 m/s. The average power increased gradually, 
from 1.8 kW for hopping in place, to 2.7 kW for running 4 m/s. 

8 Power Dissipation 

The planar biped dissipates slightly more power when it runs fast than when 
it runs slowly. Figure 14 shows the power dissipated during an experimen- 
tal run in which the control system gradually increased the running speed 
from zero to 4 m/s. The peak power was nearly constant, increasing very 
slightly at 4 m/s. The average power increased gradually as running speed 
increased. The instantaneous power shown is the product of the supply pres- 
sure measured by a sensor on the robot, and the total flow computed from 
the actuator velocities. The flow computation included an estimate of the 
flow that leaked through the servovalves, which does not appear in the actu- 
ator velocities. The estimated leakage was about 39cc/s, which at a system 
pressure of 3000 psi corresponds to 0.8 kW. The hydraulic pump maintained 
a nearly constant pressure, so the power was proportional to the flow. 

A 7.5 kW motor drives the hydraulic pump. The peak instantaneous power 
of 6.6 kW probably exceeds what the pump can deliver. However, the 5 gal 
hydraulic accumulator, the compliance of the hydraulic hoses, and the inertia 




267 



of the oil all filter out flow transients, so the pump never has to deliver the 
peak power. The average power dissipation was less than 3kW, which is well 
within the capacity of the pump and motor. The biped’s running speed is 
not currently limited by the available power. 

We measured the power dissipation during a pair of experiments in which 
the planar biped ran using two different gaits. In the first experiment, the 
biped ran with its usual alternating two-legged gait. In the second experi- 
ment, it ran by hopping on one leg. The second leg stayed short, and moved 
back and forth to compensate for the reaction torques of the active leg. 

The biped ran at about the same speed with both gaits, but dissipated less 
energy when it ran on only one leg. It ran 5.3 m/s on two legs, and dissi- 
pated 3.5 kW, with instantaneous peaks of 7.2 kW. On one leg, it ran 5.4 m/s, 
dissipated 2.9 kW, with instantaneous peaks of 6.0 kW. 

Running on one leg required the legs to sweep back and forth twice as 
frequently as they did in two-legged running, so the hip actuators dissipated 
more power. On the other hand, running on one leg meant that the other 
leg never had to change length. The leg actuator has a large area and a long 
stroke, so moving it causes a large flow that dissipates a lot of power without 
doing any work. Keeping one leg short and not moving its actuator saved 
more than enough energy to compensate for the increased dissipation of the 
hip actuators. 

The biped’s hydraulic system is very inefficient for applying small forces 
at high velocities. The pump supplies oil at constant pressure, so the power 
supplied is proportional to the flow of oil. To apply a small force, an actuator 
throttles the oil down to a lower pressure, dissipating energy. That the biped 
dissipated less power running on one leg than running on two is an artifact 
of the constant pressure hydraulic system. 

9 Summary 

The running speed of a legged system depends upon the frequency and length 
of its steps. The time required for a step can be reduced by stiffening the 
legs, and the step length can be increased by lengthening the legs. If body 
attitude is not well controlled, the limited range of motion of the hips limits 
the length of the steps. 

Experiments with the planar biped showed that it runs faster with stiff legs 
than with soft legs, and that it runs faster with long legs than with short legs. 
To get it to run fast, the control system reduces variation of body attitude by 
moving the legs symmetrically, and by compensating for velocity dependent 
hip actuator forces. The biped’s power dissipation is well within the capacity 
of its power supply. 




268 



During its fastest run the planar biped ran 5.9 m/s (13.1 mph) on long, 
stiff legs. The leg length at landing was 0.844 m, and the air spring pressure 
was 85psi. The control system moved the legs symmetrically, and compen- 
sated for hip actuator damping forces. 

Acknowledgements 

The authors would like to thank Jessica Hodgins for her help with this re- 
search, and Ben Brown and Jeff Miller for their help in designing and building 
the planar biped. This research has been supported by funds from the De- 
fense Advanced Research Projects Agency, System Development Foundation, 
and the National Science Foundation. 

References 

R. McN. Alexander, V. A. Langman, and A. S. Jayes. 1977. Fast locomotion of 
some African ungulates. J. Zoology , (London) 183:291-300. 

R. McN. Alexander and A. Vernon. 1975. The mechanics of hopping by kangaroos 
(Macropodidas). J. Zoology (London) 177:265-303. 

R. McN. Alexander. 1988. Elastic Mechanisms in Animal Movement Cambridge: 
Cambridge University Press. 

T. J. Dawson and C. R. Taylor. 1973. Energetic cost of locomotion in kangaroos. 
Nature 246:313-314. 

J. Furusho, M. Masubuchi. 1987. Control of a dynamical biped locomotion system 
for steady walking. In H. Miura, I. Shimoyama, editors, Study on Mechanisms 
and Control of Bipeds, Tokyo: University of Tokyo, 116-127. 

G. Gabrielli and T. H. von Karm&n. 1950. What price speed? Mechanical Engineer- 
ing 72:775-781. 

V. S. Gurfinkel, E. V. Gurfinkel, A. Yu. Shneider, E. A. Devjanin, A. V. Lensky, 
L. G. Shitilman. 1981. Walking robot with supervisory control. Mechanism and 
Machine Theory 16:31-36. 

A. V. Hill. 1950. The dimensions of animals and their muscular dynamics. Science 
Progress 38:209-230. 

S. Hirose. 1984. A study of design and control of a quadruped walking vehicle. 
International J. Robotics Research 3:113-133. 

D. F. Hoyt and C. R. Taylor. 1981. Gait and the energetics of locomotion in horses. 
Nature 292:239-240. 

M. Huang and K. J. Waldron. 1987. Relationship between payload and speed in 
legged locomotion. In Proceedings of Conference on Robotics and Automation^ 
IEEE, Raleigh, NC. 533-538. 

T. Kato, A. Takanishi, H. Jishikawa, I. Kato. 1983. The realization of the quasi- 
dynamic walking by the biped walking machine. In A. Morecki, G. Bianchi, 
K. Kedzior, editors, Theory and Practice of Robots and Manipulators , Proceedings 
of RoManSy’81, Warsaw: Polish Scientific Publishers. 341-351. 




PS 



269 



T. A. McMahon. 1975. Using body size to understand the structural design of 
animals: quadrupedal locomotion. J. Appl. Physiol. 39:619-627. 

T. A. McMahon and P. R. Greene. 1978. Fast running tracks. Scientific American 
239:148-163. 

T. A. McMahon and P. R. Greene. 1979. The influence of track compliance on 
running. J. Biomechanics 12:893-904. 

T. A. McMahon. 1985. The role of compliance in mammalian running gaits. J. Exp. 
Biol 115:263-282. 

T. A. McMahon, G. Valiant, and E. C. Frederick. 1986. Groucho running. J. Appl 
Physiol 62:2326-2337. 

H. Miura, I. Shimoyama. 1984. Dynamic walk of a biped. International J. Robotics 
Research 3:60-74. 

M. H. Raibert. 1986. Legged Robots That Balance. MIT Press. 

I. E. Sutherland, M. K. Ullner. 1984. Footprints in the asphalt. International J. 
Robotics Research 3:29-36. 

. Russel. 1983. ODEX I: The first functionoid. Robotic Age. 5(5):12-18. 

R. Taylor, N. C. Heglund, and G. M. O. Maloiy. 1982. Energetics and mechanics 
of terrestrial locomotion: I. Metabolic energy consumption as a function of speed 
and body size in birds and mammals. J. exp Biol. 97:1-21. 

K. J. Waldron, V. J. Vohnout, A. Pery, and R. B. McGhee. 1984. Configuration de- 
sign of the adaptive suspension vehicle. International J. Robotics Research 3:37- 
48. 




Robot Biped Walking Stabilized with Trunk Motion 



Atsuo Takanishi 

Department of Mechanical Engineering, Waseda University, Ookubo 
Shinjuku-ku, Tokyo, Japan 



Abstract 

When walking in different environments, a biped walking robot must vary its 
gait (walking period and/or step length, etc) according to the environment. In 
order to realize that, I devised a universal control method for dynamic biped 
walking on a disturbance-free flat floor stabilized with trunk motion for a biped 
walking robot which has a trunk that serves as a balancing aid. The control 
method consists of two main parts. One is an algorithm that computes balancing 
motion of the trunk automatically from motion of the lower-limbs and a 
trajectory of the ZMP(Zero Moment Point) planned arbitrarily before the robot 
begins walking. The other is a program control of the walking using preset 
walking patterns transformed from motion of the lower-limbs and the trunk. In 
1986, in order to confirm the effect of the control method, my coworkers and I 
developed the biped walking robot WL-12(Waseda Leg - 12), which has a trunk, 
and applied the control method to it. The WL-12 realized several different gaits 
during dynamic walking that was stabilized by its trunk motion on a disturbance- 
free flat floor. The minimum time spent walking during these experiments was 
1.3[sec] a step and the maximum step length was 0.3[m], After that, we refined 
the WL-12 and renamed it WL-12R(Refined). In 1988, the WL-12R succeeded 
in achieving faster walking of 0.8[sec] a 0.3[m] step. 

A small modification of the control method effectively controls any kind of 
external forces and moments, which can be considered a kind of disturbance to a 
robot in walking, if they are known before the robot begins walking. We aimed 
at realizing dynamic biped walking that is stabilized with trunk motion under 
known external force by applying this modified method to the biped walking 
robot WL-12R. We developed a system that uses a DD(Direct Drive) motor to 
generate the external force that affects the robot when it is walking. As a result 
of experiments, the WL-12R realized various gaits under known external force 
in dynamic walking stabilized with trunk motion on a flat floor. The maximum 
force strength was 10[kgf]. 




272 



1 Introduction 

In 1984, my coworkers and I succeeded in attaining dynamic walking on a flat 
floor using a biped walking robot, the WL-10RD(Waseda Leg - 10 Refined 
Dynamic)[l]. The walking lasted 1 .3 [sec/step] with a 0.4[m] step. Moreover, in 
1985, we accomplished the ascending and descending of stairs and slight inclines 
using the same robot. 

But, in case of walking in different environments, I believe that a biped 
walking robot must vary its gait (walking period and/or step length, etc) 
according to the environment. From this point of view, a biped walking robot 
which consists of only the lower-limbs does not have the capability to change its 
gait. 

Therefore, I devised a universal control method for dynamic biped walking 
on disturbance-free flat floor stabilized with trunk motion for a biped walking 
robot that has a trunk as a balancing aid. In 1986, in order to confirm the effect 
of the control method, my coworkers and I developed a biped walking robot, the 
WL-12(Waseda Leg - 12), which has a trunk, and applied the control method to 
the WL-12. 

As a result of experiments, the WL-12 realized various gaits in dynamic 
walking stabilized with trunk motion on a disturbance-free flat floor[2]. The 
minimum time spent walking was 1.3 [sec] a step and the maximum step length 
was 0.3[m]. 

After the experiments, we improved the WL-12 and named it WL- 
12R(Refined). In 1988, the WL-12R succeeded in achieving faster walking of 
0.8[sec] a0.3[m] step. 

A small modification of the control method effectively controls any kind of 
external forces and moments, which can be considered a kind of disturbance to a 
robot in walking, if they are known before the robot begins walking. We aimed 
at realizing dynamic biped walking that is stabilized with trunk motion under 
known external force by applying this modified method to the biped walking 
robot WL-12R. 

We developed a system that uses a DD(Direct Drive) motor to generate the 
external force that affects the robot when it is walking. 

And as a result of experiments, the WL-12R realized various gaits under 
known external force in dynamic walking stabilized with trunk motion on a flat 
floor. The maximum force strength was 10[kgf]. 

In this paper, the author will introduce the control method on a disturbance- 
free flat floor, the development of the WL-12 and the walking experiments in 
the first place, and secondly introduce the method for walking under known 
external forces and moments on a flat floor and the walking experiments using 
the WL-12R. 




273 



2 Control Method for Disturbance-Free Walking 

The control method for walking on a disturbance-free floor consists of two main 
parts. One is an algorithm to compute balancing motion of the trunk 
automatically from motion of the lower-limbs and a trajectory of the ZMP(Zero 
Moment Point)[3] given arbitrarily before the robot begins walking. The other is 
a program control for walking using preset walking pattems[l] transformed from 
motion of the lower-limbs and the trunk. 

In this section, the algorithm to compute balancing motion of the trunk is 
described. 

2.1 Modeling of Robot and ZMP 

Let a machine model be called a walking system from the point of view of 
system dynamics. 

Let the walking system be defined as follows: 

1) The walking system, including its trunk, is regarded as a model that has one 
particle mo for the trunk and n-particles m,- (7=1,.., n) for the lower-limbs as 
shown in Fig.l. 

2) The floor for walking is a rigid horizontal plane that can not be moved by any 
strength of forces and moments(torques). 

3) A Cartesian coordinate system O-XYZ is set, where the Z axis is vertical and 
the plane which is formed by the X and Y axes equals that of the plane of the 
floor. It is a fixed coordinate system. 

The ZMP of walking motion of the system on the Cartesian coordinate 
system can be derived as follows: 

At first, each vector is defined as shown in Fig.2. Then, we obtain an 
equation of motion at the arbitrary point P, which is obtained by applying 
D'Alembert's Principle^]. 

2>(r/-P)x(r,+G) + T = 0 (1) 

i 

By modifying (1), the components of the ZMP can be given. 

£w/(z7 + gz)xi-Ym{xi + gx)zi 

Xzmp - — jj — 

£m,(z7 + gz) 

i=0 

+ gy)z, 

v i= 0 i=0 

Izmp = 5 - 

£m/(z, + gz 



( 2 ) 




274 



A translationally moving coordinate W-XYZ is established on the waist of 
the robot in parallel with the fixed coordinate O-XYZ to ease consideration of 
relative motion of the trunks particle thq. [Xq,Yq,Zq] are the coordinates of the 
origin of the moving coordinate W-XYZ relating to the origin of the O-XYZ. 

Let (2) be modified into equations on the moving coordinates. 



izmp ■ 



+ Zq + gz^Xi-'*Tttli[xi + Xq + gxfoi+Zqj 

i=0 i=0 

£m.(j; + z<? + gz 



’Zni ( z < + *<? + + y q + gy)(z, +z q ) 

‘ ~ i-0 

’£mi(li + z q + g 2 ) 



(3) 



2.2 Solution for Trunk Motion 

We modify (3) into equations in which terms about the trunk's particle are the 
left-hand side and the rest are the right-hand side, namely Ct(t) and /3(t). 

Z oX q + Z cgx + ZoXzmp + (zo + Zg)x o - (zo + Z q + gz)x o = CX(t ) 

_ - .. N (4) 

Z oy q + Z ogy + Z o yztnp + yZ o + Z q } y® ~ ^Zo + Z^ + gz jy o = j3 (f) 



In general, both equations of (4) interfere with each other and are non-linear 
differential equations. Because each equation has the same variable z 0 and the 
trunk is usually connected to the lower-limbs through rotational joints ( so that z# 
is not linear to xq and yo ). Therefore, it is difficult to derive analytic solutions 
from these equations. So we assume that the trunk particle mo does not move 
vertically for the purpose of decoupling and linearizing them. 

z o = 0 , zo + zq = constant (5) 

Then, we obtain decoupled linear differential equations. 



(zo + Z q )xo-gzXo= a\t) 



(zo + z q )yo-gzy<> = j3'(t) ( 6 ) 

(a'(f)= OC(0-ZoX q -Zogx . /5\t) = p(t)-Zoy q -Zogy) 

As for the initial value problem, solved trajectories of equations (6) do not 
converge, because the roots of the characteristic equations of (6) are non- 




275 



negative real numbers. But, in case of steady walking, each particle of the lower- 
limbs and the ZMP movement are planned periodically for the moving 
coordinate W-XYZ. In that case Ct(t) and j3(t) are known periodic functions, so 
that these equations have periodic solutions as particular solutions. In this case, 
each solution can be represented as a Fourier series. Therefore, by using 
FFT(Fast Fourier Transformation)^], we can easily obtain approximate periodic 
solutions of (6) giving motion of the trunk's particle. 

The approximate solutions derived above are effective only in case of slow 
speed walking or for a robot with a long trunk. In other cases, the ZMP produces 
considerable error. So I worked out a repeating algorithm to obtain strict periodic 
solutions of the nonlinear equations (4) as follows: 

At first, we substitute the approximate periodic solutions of the linear 
equations (6) for the equations (3), and compute the trajectories of the ZMP 
from (3). We subtract errors between the computed trajectories of the ZMP and 
the planned time trajectories of the ZMP from terms of the ZMP in linear 
equations (6), and compute the approximate periodic solutions again. This 
operation is repeated until the error of the ZMP falls within tolerance levels. 
Therefore, we obtain strict periodic solutions of nonlinear equations (4) giving 
trunk particle motion whose precision of error is suitable for practical use. A 
flow chart of the repeating algorithm is shown in Fig.3. 

As for the convergence of the solutions, I did not prove it, but checked it 
numerically using computer simulations when the parameters were changed. The 
example where the trunk length is a parameter that is changed is shown in Fig.4. 
Another example, where the walking time is the parameter changed, is shown in 
Fig.5. From these Fig.s, we can see that this method turns out to be useful. 

2.3 Expansion into Complete Walking 

This algorithm is applicable not only to steady walking but also to complete 
walking which starts from a static standing state and returns to a static standing 
state. That is, we regard the whole complete walking motion as one walking 
cycle, and apply the algorithm to it. Then, we can obtain trunk motion for the 
complete walking as shown in Fig.6. 

It is necessary to leave a relatively long period of standing time before 
starting and after stopping walking. Because the image function (in frequency 
domain) of (6) is 



X o(oj) = 



20 

<o J +fl 2 



xb 



f 

a = 



& 



b = - 



-l 



>.Jgz(zo+Zq) 



( 7 ) 



V 



Zo+Zq 




276 




W-YYZ : moving 
O-XYZ : fixed 



Z 



mnlyf 






$ 



S ft 

0 ' 

lateral plane 
coordinates 
coordinates 



Y 



modeling of biped walking 
robot having trunk 




mi :mass of particle i (a scalar) 
ri= [xi , yi, zi] 

: position vector of particle i 

P= [Xp, Yp, 0] 

: position vector of P 

G= [gx, gv, gz] 

.•gravitational acceleration 
T= [Tx , Tv, Tz] 

: total torque acted on P 



Fig. 1 modeling of biped walking 
robot having trunk 



Fig.2 definitions of vectors for 
walking system 



C START ~) 

V - 

/ZMP's equations of walking robot/ 

1 " V " 

/non-linear differential equations/ 

/linearized differential equation^/ 
\U 

plan motions of lower-limbs and 
time trajectory of ZMP 



1 E™ 1 

, * 

Imodify Fourier coefficients! 

± 




linverse FFT| 

* 7 

/approximate periodic solutions/ 


substract errors 
from ZMP's term of 
linearized equations 


± / 


\ 


substitute periodic solutions for ZMP's 
equations on moving coordinate 




V 




errors between computed and planned ZMP/ 

^r'^erro^'^T^^ — — 



\|/ yes 

/strict periodic solutions of trunk motion/ 

C END 1 



Fig.3 flow chart of iteration algorithm 





Fig.6 example of complete walking computed by the iteration algorithm 
( step : 0.3 [m] , walking time : 1.3 [sec/step] ) 






278 



So, the original function (in time domain) of (7) is 

x*o) = be 11 (8) 

From (8), we can see that it is necessary to take enough time for the velocity to 
be near to zero in the static standing state as shown inTig.7. 



3 Configuration of Biped Walking Robot WL-12 

3.1 Machine Model WL-12 

My coworkers and I developed a machine model, the WL-12, to test the control 
method. The WL-12 weighed about 107 [kg]. It stands about 1.8[m] tall when not 
walking. An assembly drawing and a photograph of the WL-12 are shown in 
Fig.8 and 9. 

The WL-12 is made up of two legs as lower-limbs having 6 degrees of 
ffeedom(DOFs), and a trunk having 3 DOFs with a 30[kg] balancing mass as 
shown in Fig. 10. The DOFs on each leg are rotational pitch axes on the hip, the 
knee and the ankle respectively. Two of the trunk's DOFs are rotational pitch and 
roll axes on the waist, and the remaining DOF is translational connecting the 
rotationals with the balancing mass. 

An electro-hydro servo system, using an RA(Rotary Actuator) with a servo 
valve on each rotational joint and a hydraudlic cylinder with a servo valve on the 
translational joint, are employed as actuators. 

The robot's structural frame is made mainly of CFRP(Carbon Fiber 
Reinforced Plastic), and parts such as the actuators and manifold are made of 
duralumin, which reduced total weight. 

As for sensors, the RA is equipped with a potentiometer and a tachometer 
generator that detects the rotational angle and angular velocity. The 
potentiometer and the tachometer generator are directly connected to the shaft of 
the RA, making feedback control correct and stable. Also, the RA is equipped 
with 2 pressure sensors in the hydraulic circuit between the RA and the servo 
valve. They monitor the output torque. The hydraulic cylinder is equipped with a 
linear potentiometer and a linear velocity sensor for feedback control. Each of 
the soles is equipped with 2 microswitches that monitor the floor contact of the 
toe and the heel. 

3.2 Control System 

A Control System is installed on the right and left sides of the waist separately, 
and controls the WL-12 on a stand-alone basis. 

The control system is shown as a block diagram in Fig.l 1. This system has a 
hierarchic structure, which consists of a main control board using a 16 bit CPU 




Itch X 4 



System 8000 



280 




Fig.l 1 block diagram of control system 


















281 



Z8002[6] and three local control boards each using 16 bit CPU Z8002. Each 
local control board controls 3 DOFs. The main control board outputs command 
signals to the local control boards. Communication between the control boards is 
done with an 8-bit asynchronous parallel transmission using Twoport-RAMs. 
The communication speed is about 2 [Mbyte/sec]. 



4 Walking Experiment on Disturbance-Free Floor 

As a result of walking experiments on a disturbance-free flat floor, various gaits 
in dynamic walking stabilized with trunk motion were realized. The minimum 
walking time was 1.3[sec] a step and the maximum step length was 0.3[m], 

The angle and angular velocity responses in the walking experiments are 
shown in Fig. 12. In Fig. 12(a), we see the angle responses almost tracked the 
preset walking patterns. And in Fig. 12(b), we see the angular velocity responses 
on some joints did not reach more than about 150[deg/sec]. I thought that 
angular velocities in preset walking patterns would go beyond the limitations of 
the angular velocities of the actuators, so that, for a faster gait, it would be 
necessary to make preset walking patterns that consider these limitations. 



5 Control Method for Walking under External Force 

The control method is basically the same as the method for walking without 
disturbance mentioned above. It consists of two main parts. One is an algorithm 
that computes balancing motion of the trunk automatically from motion of the 
lower-limbs, external forces and moments, and a trajectory of the ZMP given 
arbitrarily before the robot begins walking. The other is a program control for 
walking using a preset walking pattern performed by motion of the lower-limbs 
and the trunk. 

In this section, the algorithm that computes balancing motion of the trunk is 
described. 

5.1 ZMP with External Forces and Moments 

The ZMP of walking motion of the system modeled in 2.1 on the cartesian 
coordinate system can be derived as follows: 

In the beginning, each vector is defined as shown in Fig. 13. Then, we obtain 
an equation of motion at the arbitrary point P, which is obtained by applying 
D'Alembert's Principle. 

2>(n ■ - P) x ( fi + G) + T - £ M/ -£ (St - P) x Ft = 0 

/ j k 



( 9 ) 




angle deg angle deg angle deg angle deg angle deg 







283 




mi .{mass of particle 1 (a scalar) 

n= [xi, y i , zi] : position vector of particle i 

P= [*i>, ypf 0] : position vector of .P 

G= [g x , svt gi] : gravitational acceleration 

T= [Tx, T y , Ti] : total torque acted on P 

Mj= [Mxj, Myj, Mzj] {external moment j 
Fk= CFxv, Fvk, Fzk] {external force k 
Sk= [x,k, y 3 k, z 3 k] {position vector where 

external force k is put 



Fig. 13 definitions of vectors for walking system 







284 



By modifying (1), the components of the ZMP can be given. 



Xzmp = 



\{zi + gz)xi- y £nii{xi + gxjZi + Myj+ £ (zskF xk - XskFzk) 

/= 0 /=0 j k 



fmiizi + gzY'ZFzk 

i= 0 k 



Yzmp = 



2 >(z> + + gy)zi+]T Mxj-^yskFik - ZskFyk ) 

i= 0 /=0 J k 



f t mi(zi + gz)-Y i Fzk 

/= 0 Jt 



( 10 ) 



Let (10) be modified into equations on the moving coordinate. These are 
divided into two parts as follows. One is the terms of motion of the robot, and 
the other is the tenns of the external forces and moments. 



Y i- 0 

JLzmp = 



: robot motion 



£/W i(zi + Z 9 + gzjXi~Y/niyXi + Xq + gx)(zi+Zq] 

1=0 

j^mlz^Zq + gzY^Fzk 

J=0 * 

Y^Myj+^ZskFxk - XskFzk ) 

+ — , * r : external forces and moments 

Y i mi[zi + z q + gz)-'£Fzk 

i=0 k 

Ymlzi + Zq + + j >4 + gv)(z>+Z<?) 

= — — — — 5 — : robot motion 

'£ i mi[zi + z q + gz)-'£Fzk 



^Mxj + ^(yskFzk - ZskFyk ) 

~ — -nr— v : external forces and moments 

Y j m\z, + z q + gz)-Y J Fzk 

j=0 * 



( 11 ) 



5.2 Solution for Trunk Motion 

We modify (1 1) into equations in which the terms of the trunk's particle are on 
the left-hand side and the rest, namely Ct(t) and j3(t), are on the right-hand side. 

Z oi^Xq + gx ) + Z o Xzmp + (zo + Zqjxo - (zo + Zq + gz'jx 0 = (X(t) 

Zo(yq + gy) + Z oyzmp + (z o + Zq)yo-(zo + Zq + gz)yo = P( l ) 



( 12 ) 




285 



We devide CC( t) and /3(t) into parts as follows. One is the terms of motion of 
the lower-limbs and the trajectory of the ZMP, and the other is the terms of the 
external forces. 



a<o= ai(o+a:(o 
j3«) = j3i(0+j32«) 

where 



(13) 

(14) 



a>(0 = 



OCi(t) = 



-£mi(zi + Zq + gz)xzmp + £/Mi(zi + Zq + gz)xi 
. i i 

+Y,Wi(xi + Xq.+ gx)(zi + Zq) - mo(xq + gx)zq - im(zq + gz)xzmp 



X F&x™p + X Myj {( Zsk + Zq)Fxk - XskFzk) 



Into 



Into 



J3i(0 = 



+ Zq + gz)y™p + Z W*(2i + 2*9 + gz)j/i 

/ z 

+ yg + gy)(zi + Z 9 ) - mo(yq + gy)z g - mo(zq + gz)y: 



/ mo 



fiz(t) = 



2] Fzkyzmp + 2 Mxj + X {( Zsk + Zq)Fxk - XskF-k | 



//no 



I described in 2.2 that a(t) must be known terms. Ctj/t) is known because we 
give motion of the lower-limbs and the trajectory of the ZMP before the robot 
begins walking. Therefore, C^/t) must be the known terms. The same thing can 
be said about /3(t). When the external forces and moments are known, we can 
derive the time trajectory of the trunk. 

Both equations of (12) interfere with each other and are non-linear 
differential equations. I already discussed in 2.2 how to obtain a strict periodic 
solution for the time trajectory of the trunk (see Fig.3). 



6 Walking Simulations under External Force 

According to the method in chapter 5, as long as the denominator of (10) is 
positive, external forces 'and moments can have any direction and strength. 
However, in our experiments, we adopted to pull the robot backward with a wire. 
Therefore, adjusting to this, simulations in the case when a rectangular external 




286 



force is put on the waist of the robot opposite to the direction it is walking are 
shown in Fig. 13. These figures are the cases when walking the speed or the force 
strength changes. 

In Fig. 13, we can see that against the backward external force, the robot 
compensates moment for the changed ZMP by moving its trunk forward at an 
angle. 

The faster die walking speed is, the smaller the trunk's stabilizing motion 
becomes as the inertial force affects it. 

Stabilizing motion against the external force begins a little before the force is 
applied, and ends a little later when the force returns to zero. I described the 
reason for this in 2.3. 

In this method, as stabilization against an external force is created only by the 
trunk, the stronger the external force becomes, the larger trunk motion around 
the pitch axis becomes. As the WL-12R has a limited movable angle, the power 
of the external force which the WL-12R can stabilize is limited. Especially when 
the walking speed is slow, effect of stabilization with inertial force is reduced. 
Thus its influence is considerable. For example, when the time spent walking is 
2.6[sec] a step and the external force is 10[kgf], we can see in the simulation that 
there are phases where motion of the trunk's pitch axis surpasses the movable 
angle of its actuator. 



7 WL-12R and External Force Generating System 

7.1 Biped Walking Robot WL-12R 

We improved the WL-12 mainly in two points and renamed it WL-12R. A 
drawing of the assembly of the WL-12R is shown in Fig. 14. 

The translational joint on the trunk was removed as there had been no need to 
move the trunk's balancing mass linearly for the effect of the repeating algorithm 
to solve nonlinear problem. So that, totally the WL-12R has 8 DOFs including 2 
DOFs on the trunk. 

Some parts concerning the robot's structural frame are reinforced using 
CFRP plates to reduce the elastic vibration in walking. 

7.2 External Force Generating System 

The scene of a walking experiment system under external force is shown in 
Fig. 15. In this system, as described in chapter 6, the direction of the force is 
limited only in the direction opposite to the walking orientation. And we 
adopted the system of pulling the robot backward by a wire. We used a DD 
motor, which can be controlled easily and has no clattering with reference to the 
reduction mechanisms such as the gearing, to generate external forces. We 
connected a pulley to the motor directly and rolled up the wire to translate the 
torque of die motor into the tension of the wire. 




valkirvg 



28 




offsets of the force strength are given as initial tension of wire 










288 



As for the sensors, the DD motor is equipped with a pulse-encoder to detect 
the distance the robot will walk, and a tachogenerator to measure the walking 
speed. We installed a force sensor to the wire to measure the force which acted 
on the robot. 



8 Walking Experiments under External Force 

My coworkers and I performed walking experiments on a flat floor using the 
WL-12R and the external force generating system. As a result, various gaits in 
dynamic walking under known external force stabilized with trunk motion were 
realized. The forces were rectangular and trapezoidal. The maximum force 
strength was 10[kgf]. 

The force strength, the distance walked and the walking velocity response 
produced by the sensors of the external force generating system are shown in 
Fig. 16. 

In Fig. 16(a), we see the force response almost tracked the preset pattern 
except for short vibrations. These vibrations were due to the vibration of the 
wire. In Fig. 16(b), we see that the walking velocity response differs considerably 
from the preset pattern. Excepting the short cycle of vibrations, however, it 
almost exactly tracks the preset pattern. The reason for this is the same as in the 
case of the force response. 

In the same figure, we see a long cycle of vibrations when the robot stops 
walking. I think that this is because the robot rocked for a while after it stopped 
walking, and the sensor recorded its motion. 



9 Conclusion 

Various gaits in dynamic complete walking on a disturbance-free flat floor 
stabilized with trunk motion were achieved by the biped walking robot WL-12. 
The minimum walking period was 0.8[sec] a step and the maximum step length 
was 0.3 [m] step. 

Also, various gaits in dynamic complete walking under external force 
stabilized with trunk motion were achieved by the biped walking robot WL- 
12R. The maximum force strength was 10[kgf]. 

As a result, my coworkers and I confirmed the performance of the biped 
walking robot WL-12(R), and the control methods for robot walking without 
disturbance and under known external forces and moments on a flat floor were 
experimentally supported. 




WI-12R 




distance 



290 



preset pa tern 




(c) distance walked response 





Fig. 17 responses at walking experiment 

( step : 0.3 [m] , walking time : 1.7[sec/step] ) 





291 



Acknowledgements 

The author thanks KURODA PRECISION INDUSTRIES LTD., MOOG 
JAPAN LTD. and TORAY INDUSTRIES INC. for supporting us in 
developing the WL-12. 

This study was supported in part by CASIO SCIENCE PROMOTION 
FOUNDATION, Japan. 



References 

1. Takanishi,A., Ishida,M., Yamazaki.Y. and Kato,I.: The Realization of Dynamic 
Walking by The Biped Walking Robot WL-10RD, ICAR'85, 1985 

2. M.Vukobratovic, A.Frank: On the Stability of Biped Locomotion, IEEE Trans, on 
Biomedical Engineering, Vol. BME-17, January 1970 

3. Takanishi,A., et. al.: Realization of Dynamic Biped Walking Stabilized with 
TrunkMotion, ROMANSY, 1988 

4. Herbert Goldstein: Clasical Mechanics, Addison-Wesley, 1980 

5. J.W.Cooley, P.A.W.Lewis, P.D.Welch: The Finite Fourier Transform, IEEE Trans, 
Audio and Electroacoustics, vol.AU-17, No.2, pp.77-85, June 1969 

6. Z8000 Data Manual, Zilog Corporation, 1980 




Part 4 

Intelligent Motor Control 





A New Concept of the Role of Proprioceptive and 
Recurrent Inhibitory Feedback in Motor Control 



Uwe Windhorst 

The University of Calgary, Faculty of Medicine, Departments of 
Clinical Neurosciences and Medical Physiology, 3330 Hospital Drive 
N.W., Calgary, Alberta, Canada T2N 4N1 



Abstract 

The mammalian motor control system is a multi -variable, nonlinear 
and time-varying system. With motor capabilities increasing 
throughout evolution, the nervous system likely has developed 
particular solutions to cope with the complex properties of its 
own executive instruments, in particular muscles. It is proposed 
here that spinal recurrent inhibition via Renshaw cells and 
proprioceptive feedback via muscle spindles monitor some important 
variables of muscle function and adapt spinal neural systems to 
them. 

In mammals, many skeleto-motoneurone pools are a common 
source of both recurrent inhibition via Renshaw cells and 
proprioceptive feedback via muscle fibres and proprioceptors. On 
the other hand, the two feedback pathways exert common actions on 
a number of spinal neurones including skeleto- and 
fusi-motoneurones , reciprocal la inhibitory interneurones and VSCT 
cells. This implies, that these target neurones receive a compound 
information . 

This information is about the basic determinants of skeletal 
muscle force production. This is determined primarily by the 
neural input to the muscle fibres, as well as by their length 
and/or their velocity of length change. The first variable is 
monitored by Renshaw cells. Muscle fibre length is measured by 
spindle group la and II afferents, dynamic length change by la 
fibres in particular. 

An important question is to what extent recurrent inhibition 
reflects, or predicts, neural muscle force production. This is 
considered with respect to the two dimensions of motoneurone 
output: recruitment and rate coding, and their relation to Renshaw 
cell and force output. Excitatory input to and, to a lesser 
extent, output from, Renshaw cells show a similar dependency on 
orderly motoneurone recruitment as does cumulative muscle force 
output. The static dependencies of Renshaw cell rate and muscle 
force on motoneurone activation rate also exhibit similar 
relationships, which for Renshaw cells may be adjustable by 
proprioceptive feedback of muscle length. 




296 



1 Introduction 

In some sense, the task of robotics engineers is simpler 
than that of neurophysiologists. Whilst the former first 
define their goals and then search for adequate means to 
achieve them, the latter are confronted with an existing, 
functioning device, whose parts must be analyzed and 
functionally interpreted using notions and concepts whose 
appropriateness is not clear a priori. 

1.1 General Remarks and Assumptions 

1 ) The control of posture and movement in higher 
vertebrates is not a simple, but a complex task. Simple 
models, such as the defunct follow-up servo model, are 
therefore inadequate to deal with it. 

2) The mammalian motor control system is a multi- 
variable, nonlinear and time- varying system . A model has to 
take account of and represent these properties . 

3) The motor control problems to be solved in mammals are 
not only posed by the constraints originating from external 
and internal forces and from mechanical interactions between 
limb segments, but also by those implicated by the very means 
(e.g. muscles, tendons, joints, etc.) which were evolved by 
the organism to solve them. With increasing motor 
capabilities , the nervous system has developed particular 
solutions to cope with the properties of its own instruments . 

4) Proprioceptive feedback has evolved to monitor the 
organism's self-generated actions (Sherrington 1906; Evarts 
1981). But undoubtedly, it plays various roles in motor 
control at different levels of the central nervous system. 
Essentially the same applies to recurrent inhibition, this 
basic circuit being found ubiquitously throughout the 
neuraxis. However, at spinal level, proprioceptive and 
recurrent inhibitory feedback exert a common function, i.e., 
they are involved in the control of muscle force production, 
particularly in muscles acting across proximal limb joints. 




297 



This paper presents a proposal confined to a subset of roles 
at the spinal level . 

5 ) Commonly descending motor commands have been 
envisaged as modulating (even gating) the responses of spinal 
neural networks to peripheral inputs. In contrast, it is here 
postulated that peripheral inputs (including proprioceptive 
input) serve to modulate such responses to descending 
commands . Whereas this does not make much difference 
physically, it does conceptually (see also Gottlieb and 
Agarwal 1980). 

In the following, a general system description will first 
be given; then, due to the limited space, the description will 
focus on some features of muscle force production and 
recurrent feedback which will illustrate some of the general 
ideas . 

1.2 General System Description 

In mammals many skeleto- (a,£-) motoneurone pools are a 
common source of both recurrent inhibition via Renshaw cells 
and proprioceptive feedback via muscle fibres and 
proprioceptors. On, the other hand, the two feedback pathways 
exert common actions on a number of spinal neurones including 
skeleto- and f usi-motoneurones , reciprocal la inhibitory 
interneurones and VSCT cells (see Fig. 1; reviewed in 
Windhorst 1988). This implies that these target neurones 
receive a compound (vector) signal. This carries information 
about the basic determinants of skeletal muscle state or, more 
precisely, its force-producing capacity. Force production is 
determined primarily by the neural input to the muscle fibres, 
their length and/or velocity of length change (see Fig. 2). 
The first variable is monitored by Renshaw cells, and muscle 
fibre length is measured by spindle group la and II afferents, 
dynamic length change by la fibres in particular (Fig. 2). 

The compound information generated by both feedback 
subsystems is used to adapt the parameters of spinal neural 
networks (thresholds, parameters of static and dynamic input- 
output relations and their distributions) to the prevailing 




298 




Fig . 1 : Schematic comparison of some central neuronal 

connections involving group Ia afferents (from primary muscle 
spindle endings) and Renshaw cells . For simplicity, 

connections from other proprioceptive afferents are not 
depicted here. Renshaw cells receiving their main excitatory 
input from extensor a-motoneurones (a) inhibit synergistic a- 
motoneurones (but not antagonistic a-motoneurones acting 
around the same joint) and those reciprocal la inhibitory 
interneurones ( recip . Ia inh. int.) which receive monosynaptic 
excitation from extensor Ia fibres and inhibit antagonistic 
flexor a-motoneurones (right side), and vice versa (not 
depicted). The same Renshaw cells inhibit extensor % - 
motoneurones (left side) and VSCT cells that receive 
monosynaptic Ia excitation whose exact origin is not yet 
known, however, and is here assumed to come from extensors 
(dashed connection). 




299 




U (t) 



Fla. 2 : General scheme of an adaptive motor control 
system incorporating recurrent inhibition. All pathways are 
represented by double lines in order to indicate signal 
vectors. Skeleto-motoneurone ( MN ) output or muscle input is 
designated dj^(t) and sensed by Renshaw cells. The muscle 
output of interest here is designated dU^ F (t) and is sensed by 
muscle spindles. Renshaw cells and muscle spindles receive two 
components of the command (and other) input , uj(t) and u ^ (t). 
respectively. Their outputs , e R ^(t) and e (t) , are fed back 
to spinal networks , including skeleto- ana fusi-motoneurones , 
Renshaw cells (and reciprocal la inhibitory interneurones , not 
represented for simplicity) , for parameter adjustment, which 
is symbolized by the oblique arrows originating from the 
parameter-adjustment subsystem and drawn across the other 
subsystems. Anatomically, the parameter-adjustment subsystem 
is not sharply delineated from the other subsystems, as 
symbolized by the dashed box, but may contain further 
interneurones. It may thus also receive a separate input 
signal component U 2 (t), different from those to Renshaw cells, 
fusi-motoneurones and skeleto-motoneurones . The entire input 
vectors are designated U(t ). (With permission from Windhorst 
1990, his Fig. 2) 






300 



muscle mechanical state for optimization of motor actions 
(Fig. 2). Depending on the motor task, this optimization may 
concern the precision, velocity or smoothness of movement, the 
stability of posture and movement, or the minimization of 
energy expenditure and mitigation of the consequences of 
muscle fatigue. 

In general, the force-producing capacity of muscle has to 
be adapted to a diversity of motor tasks and their contexts . 
These are determined by descending motor commands and further 
peripheral states signalled by a variety of segmental 
afferents (these signals are comprised in the vector U(t) in 
Fig. 2). Commands and afferent information (including spindle 
group II and Golgi tendon organ signals) thus have an 
influence, except on skeleto-motoneurones (denoted by u^(t) in 
Fig. 2), on recurrent inhibition (via pathways circumventing 
skeleto-motoneurones, denoted by Rj(t) in Fig. 2) and 
proprioceptive feedback (via fusimotor spindle innervation, 
denoted by u^ft) in Fig. 2). One of the results of this 
influence is to adjust the gains within the state feedback 
pathways according to the different tasks and contexts. 
Concerning the feedback from la afferents it is therefore 
clear that, apart from their background excitation, their 
static and dynamic sensitivities (to length and velocity) 
should be adjustable by separate static and dynamic fusimotor 
neurones. A similar differential adjustment might occur in the 
recurrent inhibitory pathway, but has as yet hardly been 
investigated (Windhorst 1988). 

This general concept requires that the organization and 
properties of both feedback subsystems be matched to a certain 
extent. The notion of matched inputs to skeleto-motoneurones, 
fusimotor and Renshaw cells is illustrated in Fig. 3, in 
relation to descending command signals as well as group II 
muscle spindle afferents (see also below). 

2 Specific Outline 

Generally, the behaviour of a system can be described in 
the form of two sets of equations. The first is a set of 




301 




Fig . 3 : Simplified block-diagram of the " triple input " to 
a- and P-motoneurones (middles a,P), Renshaw cells (lefts RC) 
and y-motoneurones (rights ft). Descending command inputs are 
regarded as "matched" if they excite a-, p- and ft- 
motoneurones , but inhibit Renshaw cells (note that inhibition 
of these cells disinhibits a- and P-motoneurones ) , or vice 
versa. An example is provided by the modulatory influence of 
muscle length (L) on muscle spindle (MS) afferent signals and 
recurrent inhibition. The muscle block is divided into two 
parts. The one labelled "a Mu" (for active muscle) represents 
the unloading , firing-rate reducing effect of contraction on 
spindles (note sign inversion). The other labelled "p Mu" (for 
passive muscle) represents the effect of externally generated 
length changes on spindle discharge , length increase causing 
firing rate acceleration (note positive sign). These parts are 
not separable in reality. Spindle afferents have feedback 
effects on homonymous (and heteronymous) neurones f as follows. 
As already shown in Fig. 1, la fibres (la) monosynaptically 
excite homonymous skeleto-motoneurones , a polysynaptic 
excitation being represented by the additional interrupted 
line. Group II (GII) spindle afferents also have a weak 
monosynaptic connection (dashed line) to these motoneurones. 
An oligosynaptic inhibition (of extensor motoneurones: 
interrupted line) occurs when "gated". These afferents have a 
powerful (oligosynaptic) excitatory action on homonymous ft- 
motoneurones (e.g.. Noth and Thilmann 1980; Appelberg et al. 
1983 ), and a matched inhibitory one on Renshaw cells (Meyer- 
Lohmann et al . 1976; Fromm et al . 1977; see also Windhorst 
1988) . 




302 



first-order (in general nonlinear) differential equations 
describing the change in system states as a function of their 
present values and the system inputs . The second describes the 
system output. In the following, vector notation (indicated by 
bold-face letters) is used to emphasize multi-input, multi- 
output aspects . 

2.1 Skeletal Muscle 

Specific assumptions and definitions: 

1) The primary output of skeletal muscle is force (denoted 
by F(t)). This is determined by three major variables: neural 
excitation (and its history), muscle fibre length and its time 
derivative (shortening or lengthening velocity). 

2) The neural input ( skeleto-motoneurone output) is a 
multi-dimensional signal , the space being spanned by at least 
three basis vectors: (a) a spatial or topographic dimension 
(since the signals carried on motor axons are distributed to 
different portions of the muscle), (b) a recruitment dimension 
(variable number of active motor axons), and (c) a rate 
dimension (variable discharge rate on each motor axon) . Hence, 
it is a vector variable denoted by Also, skeletal 
muscle is a time-varying system, i.e., its internal state and 
output F(t) depend on the previous history of muscle 
excitation. This is represented by a delay variable 8 in 

—MN^r & ) • 

3) According to the well-known length- tension and force- 
velocity relations , the active force production of each muscle 
fibre depends on its length d MF , as well as on the lengthening 
(positive) or shortening (negative) velocity v=d^ F /dt (e.g., 
Talbot and Gessner 1973; Partridge and Benton 1981; McMahon 
1984). It is important to emphasize muscle fibre length 
because this often shows a complicated relation to whole 
muscle length due to a complex internal muscle architecture 
(including pinnation; see Gans and Bock 1965; Partridge and 
Benton 1981; Sacks and Roy 1982). Indeed, muscle fibre length 
often changes nonlinearly with whole muscle length (Meyer- 
Lohmann et al. 1974; Windhorst et al. 1976), and pinnation 




303 



angle varies in different muscle regions (for examples see 
Windhorst et al . 1989). Hence, muscle fibre length and its 

derivative are (spatially) distributed variables , thus 
generally being denoted as vector variables d MF (t) and d MF (t). 
They may be considered as boundary conditions . 

Assume that the dynamic behaviour of the muscle system is 
characterized by a number of state variables Xj_(t ) , i=l...n. 

It can then be described by a set of n first-order 
differential equations: 

dx^(t)/dt = M^{x^ (t) , . . . , x n (t) ; d^( t , S ) r d^ F ( t ) r v( t )} . 

Since the x^(t) can be combined in a state vector x(t), 
the above system of equations can be written compactly as : 

dx(t)/dt = M{Z(t) ,d m (t,8) ,d MF (t) ,v(t)} . (la) 

The output (force) vector is given as 

E(t) = N{x( t) ,d m ( t,8) r d MF ( t) ,v( t) } . (lb) 

That is, the temporal change in the (internal system) 
state vector x(t), as expressed by the first equation (la), as 
well as the system output F(t) ( Eq . lb) are determined by the 
present state value and by the input vector (muscle excitation 
—Mn( t)r muscle fibre length d MF (t) and velocity v(t). 

The dependence of active force (F) of a single or a group 
of parallel fibres on length (d MF ) and velocity (v=d MF /dt) is 
usually expressed separately in the two well-known length- 
tension and force-velocity (Hill) relationships (see, e.g., 
Talbot and Gessner 1973; Partridge and Benton 1981; McMahon 
1984). Carlson (1957) has provided a simple compound 
mathematical description for whole muscle in which the two 
above dependencies on length and shortening velocity are 
represented as additive terms. This approach is here extended 
and generalized. Thus, assume that f o(^Mn) denotes the 
isometric force at the resting length d MFo and v=0, produced 
by a given level of neural excitation d MN , that Sd MF is the 
difference between the actual fibre length d MF and its resting 
length d MFo . Then : 

F ( d MF' v ) = F o( d MN) ~ A ' (& d MF) 2 + B.v/ (C+v) , (2a) 

for v>0 (lengthening), and 

mf , v ) = F o(^Mn) ~ (Ady F )2 + D.v/ (E-v), 



(2b) 




304 



for v<0 (shortening). A, B, C, D and E are constants. Note 
that the second right-hand term which represents the length- 
dependence is a simple parabola (see also Woittiez et al. 
1983) and that the rightmost term in Eq. 2b represents Hill's 
hyperbola. This formulation separates the three major 
determinants of muscle force. It is of interest that also the 
fatiguability of a muscle depends on its length (Fitch and 
McComas 1985). 

2.2 Muscle Spindles 

Specific assumptions and definitions: 

1) Muscle fibre length d MF (t) and velocity v(t) 
(vectorial quantities) are boundary conditions co-determining 
the internal muscle state for force production. 

2) These variables are measured by muscle spindles. With 
Y(t) denoting the spindle state, spindle behaviour can be 
described by the following two state equations: 

dx(t)/dt = Q{Y(t) ,d MF (t) , v(t),u 1 (t,8)} l (3a) 

e Ms( t ) = £<£(*) '4 MF (t) 'Z(t) .ujft, 8 )} , (3b) 

where spindle performance depends on the history of u^ft^S) 
(see, e.g.. Brown et al. 1969; Gregory et al. 1987; review in 
Hulliger 1984). 

The response of spindle afferents to large -amplitude 
length changes is complicated and nonlinear (Hasan 1983) and 
depends on the pattern of fusimotor input (Boyd 1981; Matthews 
1981; Hulliger 1984). 

2.3 Renshaw Cells 

Specific assumptions and definitions: 

1) Renshaw cells monitor the neural motor input to 
skeletal muscle, i.e., rat ^ er faithfully (Cleveland 

1980), thus providing information about this determinant of 
force production. 

A pair of equations describes the Renshaw cell system as 
follows : 




305 



dz(t)/dt = Q{z(t),d m (t),u 3 (t)}. (4a) 

e RC (t) = R{z(t),d m (t),u 3 (t)}. (4b) 

As the neural input to skeletal muscle, the input to 
Renshaw cells from motor axon collaterals, d^ N (t), var i es i- n 
three dimensions: a spatial coordinate, recruitment and rate. 

2. 4 Combined Feedback 

The recurrent and proprioceptive signals, e RC (t) and 
com ki ne i- n commonly contacted central neurones to a 
compound signal, j£p(t)t 

Hp(t) ~ S{e R Q(t) , (5) 

which interacts with other inputs. For instance, in a- 
motoneurones the signal d^ft) is a function of Hp(t) anc * 
M4(t): 

dmft) = l{s p (t) ,u 4 (t)} , (6) 

where the transformation of u^(t) into d MN (t) is thought to be 
adjustable by JSp(t) in a nonlinear way (see Windhorst 1990). 

3 Are Renshaw Cells Central Models of Neurally Generated 
Muscle Force Production? 

In the remainder of this paper, it is discussed to which 
extent Renshaw cell discharge reflects the neurally generated 
muscle force output. As yet, there is sparse indirect evidence 
suggesting a similarity of both outputs on recruitment and 
rate of motoneurone output. 

3 . 1 Recrui tmen t 

Muscle. In many steady or slow muscle contractions of 
increasing force, motor unit recruitment occurs in an ordered 
fashion. Recruitment starts with S-type (slowly contracting, 
fatigue-resistant) motor units, proceeds with FR- (fast, 
fatigue-resistant) and F(int )- type (fast, int ermediate fatigue 
resistance) motor units and ends with FF-type (fast, 
fatiguable) motor units (e.g., Burke 1980). Twitch and tetanic 
contraction forces of the motor units increase in about the 




306 




lOO-i 



B 



5 

n 



o 

AC 



30 - 

10 - 

3 - 






b 







Fig. 4 A-B : Cumulative effects of recruiting increasing 
percentages of the skeletomotor axons supplying a muscle 
(abscissa) on muscle force (A) and Renshaw cells (B) . Note the 
logarithmic ordinates. The open circles in part A plot the 
cumulative muscle tension resulting from summing up the 
tetanic tensions of the individual motor units in the cat 
flexor digitorum longus in an order from low to high tetanic 
tension ( orderly recruitment ). A straight line was fitted (by 
eye) to the upper part of the data points, indicating an 
exponential relationship as given ( q represents the portion of 
the motor pool recruited). The dashed line labelled b 
represents a random recruitment model (data points and dashed 
line replotted from Bagust et al . 1973, their Fig. 1 D) . Part 
B shows, on the same abscissa as in part A , the (normalized) 
cumulative excitatory input to Renshaw cells ( e rc) during 
orderly motor axon recruitment. If Renshaw cells behaved 
linearly, their output (spike rate) would obey the same 
relationship. However, usually they exhibit saturation to 
increasing excitation from parallel inputs, which has been 
taken into account in the upper relationship labelled b (data 
extracted from Hultborn et al . 1988, their Figs. 3 and 6a). 
Note the similarity between the fitted straight lines in parts 
A and B. 




307 



same order. Based on the measurement of isometric tetanic 
forces of motor units of the cat flexor digitorum longus 
muscle and assuming a recruitment strictly ordered according 
to increasing tetanic force, the cumulative muscle force 
output is depicted by the open circles in Fig. 4 A. The upper 
portion of this relationship can be approximated by a straight 
line as indicated. The dashed line above (labelled b) 
represents a model of random recruitment (see Bagust et al. 
1973). Since actual recruitment does not take place precisely 
according to tetanic motor-unit force (and since some 
nonlinear interaction may occur between motor units; see 
Demi^ville and Partridge 1980; Niemann et al . 1986), the true 
cumulative force will probably lie somewhere between the two 
relationships, but closer to the lower one. 

Renshaw cells . The excitatory input to Renshaw cells from 
motor axon collaterals is also differentiated according to 
motor unit type, increasing from S- to FF-motoneurones 
(Cullheim and Kellerth 1978b; Hultborn et al. 1988). Recent 
results of Hultborn et al . (1988) suggest that this input on 
average increases with the number of activated motor axons as 
shown by the open circles (labelled a) in Fig. 4 B (although 
there is much variance in the data from individual cells). If 
Renshaw cells converted this input proportionally into firing 
rate, the input-output relationship would have the same shape. 
However, since inputs may interact nonl inear ly, the Renshaw 
cell output follows the relationship represented by the dashed 
line (labelled b) . As the nonlinear interaction appears to 
have been particularly strong due to the testing procedure 
used by Hultborn et al . (1988), the actual dependence of 
cumulative Renshaw cell output on the number of recruited 
motor axons probably lies between the two relationships, 
possibly closer to the lower one. Note the similarity of the 
curves labelled a in parts A and B. 

3.2 Rate 

Muscle. As is well known, the rate dependence of muscle 
force production is nonlinear. Figure 5 shows several 




308 




Fig. 5 A-D ; Dependence of muscle force and torque on the 
motor axon activation rate. Part A : The open circles plot mean 
force of the isometric cat soleus muscle as a function of 
activation rate (data points replotted from Rack and Westbury 
1969, their Fig . 7); they are fitted by a rectangular 
hyperbola disregarding low-rate values as indicated. Part B: 
Effect of ankle joint angle (muscle length) on force-rate 
relationships of the cat soleus [open circles : joint angle 45° 
(long muscle); filled dots : joint angle 90°; open triangles : 
joint angle 120° (short muscle); data replotted from Rack and 
Westbury 1969, their Fig. 9a ]. Part C : Force-rate 
relationships of the cat soleus muscle during isometric (open 
circles) and dynamic lengthening (filled triangles) at a joint 
angle of 70° (data replotted from Joyce et al. 1969, their 
Fig . 4). Part Di Effect of ankle joint angle on the 
relationship between ankle dorsiflexor torque and rate of 
supramaximal stimulation of the tibial anterior muscle in man 
[open circles s joint 30° plantarf lexed (long muscle); filled 
dots : joint 5° plantarf lexed; open triangles s joint 20° 
dorsif lexed (short muscle); data replotted from Marsh et al. 
1981, their Fig . 5 B] . Note the shortening of k with 
increasing muscle length in both parts B and D. 




309 



examples. The measurements in Fig . 5 A-B were made on the slow 
isometric cat soleus muscle. The data show the frequently 
observed S-shaped relationship between activation rate and 
force. Usually, however, the lowest regular firing rates of 
motor units at their recruitment are between ca . 5 and 12 pps 
(Freund 1983). Hence, the points at the very low rates in Fig. 
5 A and B are not important physiologically. For this and 
another (see below) reason, I have fitted the data with 
rectangular hyperbolas of the form 






3 + ( f ~ f a ) 

where F denotes muscle force, f the activation rate, f a the 
abscissa intersection, b the saturation constant (maximal 
force output approximated at infinite rate), and j the rate at 
which half the saturation is attained. The steepness and 
location on the f-axis of the steep part of the rate- force 
relationship is important because it determines the range of 
most effective rate modulation. 



Figure 5 B suggests that (1) the abscissa intersection 
( f a ) and (2) the semi-saturation constant j decline with 
increasing muscle length (joint angle; see legend). The latter 
constant (j) also declines if, in man, the rate-dependence of 
the ankle dorsiflexor torque ( T) exerted by the faster tibial 
anterior is determined at increasing plantarf lexion (see 
legend). Very probably, this length dependence results from a 
slowing of muscle twitch (particularly the relaxation phase) 
with an increase in muscle length (Rack and Westbury 1969; 
Marsh et al . 1981). That is, j increases with decreasing 
contraction speed. Also, Fig. 5 C for the cat soleus muscle 
shows that the rate-force relationship also depends on dynamic 
muscle lengthening. Hence, this relationship is strongly co- 
determined by peripheral factors, such as muscle length and 
velocity. The question then arises whether the Renshaw cell 
system can at all reflect this situation. 

Renshaw cells. The relationship between the activation 
rate of motor axons (f) and the discharge rate of Renshaw 




310 



cells (r) can also be described by a rectangular hyperbola 

(Cleveland 1980; Cleveland et al . 1981): 

c.f 

r = (8) 

k + f 

where c is the maximal (saturation) Renshaw cell rate 
attainable with that input, and k denotes the semi-saturation 
input rate. An example from our own data is illustrated in 

Fig . 6 A . The distribution of k found by Cleveland ( 1980) is 
shown in Fig . 6 B . It appears as if, on average, k tended to 
be longer than j (cf. Fig . 5), the more so as the 

physiologically relevant relation between input ( f) and output 
(r) rates starts at jf>0 pps (see above). That is, as 

illustrated in Fig . 6 C, k' becomes longer if determined for 

the relationship above, say, f=6 pps . 

However, it must be taken into account that the static 
Renshaw cell input-output relationship is not constant, but is 
subject to modulating influences (see Fig . 3) . There is much 

evidence that command signals for muscle activation, except 
for being directed to skeleto-motoneurones , also influence 
Renshaw cells via independent pathways (reviewed in Windhorst 
1988). Hultborn and Pierrot-Deseilligny (1979) proposed that 
Renshaw cells coupled to the human soleus muscle are 
progressively depressed (from an initially facilitated state) 
during a continuing ramp-like isometric muscle contraction. 
Hence, these cells receive augmenting excitatory input (from 
soleus motor axon collaterals) and inhibitory input (from 
independent descending pathways; see Fig . 3) . Assume that, as 

a result, Renshaw cell firing is dynamically depressed such 
that the saturation constant, c, linearly decreases with 
increasing activation rate, r, during augmenting contraction 
(recruitment is here disregarded). As an example, for the 
lower curve in Fig. 6 D, c decreases from a value of c=200 pps 
at r=0 pps to c=150 at r=50 pps. The constant k will then 
appear to be shorter than it would be without the depressing 
effect. Moreover, this effect may be enhanced by the positive 
linear correlation between c and k across cells (Cleveland 
1980). Finally, k might also be depressed directly by the 




311 





Fla . 6 A-D : Dependence of static Renshaw cell firing rates 
on motor axon activation rates. Part A displays our own 
measurements from a cat Renshaw cell activated by supramaximal 
stimulation of motor axons in the posterior biceps nerve 
(dorsal roots Lg— cut ) at constant but different rates 
(solid dots). The dots were approximated by a rectangular 
hyperbola passing through the origin as indicated. The plot in 
part B shows the distribution of the semi-saturation 
constants , k, as found by Cleveland (1980; his Fig. 5, with 
permission); 41 measurements on 32 cells; the arrow on the 
abscissa indicates the mean value. Part C illustrates the 
effect on k of disregarding activation rates lower than r=6 
pps (stippled area). The characteristic values of the complete 
original rectangular hyperbola passing through the origin 
(c=200 pps , k=15 pps) become larger when determined for the 

same relationship above r=6 pps (dashed lines). Part D shows 
the effect, on the same rectangular hyperbola as depicted in 
part C (upper curve in D ) , of linearly decreasing c with 
increasing r (from c=200 pps at r=0 pps to c=150 pps at r=50 
pps) . The apparent semi-saturation constant, k, thereby 
decreases from k=15 pps to about k=6 pps. 




312 



descending command signal. Whether such effects occur is 
unknown and remains to be investigated. 

Also, as illustrated in Fig . 3, group II afferents from 
secondary muscle spindle endings possibly inhibit homonymous 
Renshaw cells (Meyer-Lohmann et al . 1976; Fromm et al . 1977). 
If this has the same effects on c and k as described above for 
the descending command signal, this could adjust k to muscle 
length, in a way mimicking the dependence of j on muscle 
length (Fig* 5). The precise quantitative relationships have 
still to be determined in the same muscle of the same animal. 

Two points should be added: (1) At least as important as 
the adjustment of static input-output relations is that of 
dynamic characteristics ; (2) The adjustment of recurrent 
inhibition also affects the signal transfer properties of 
motoneurones . 

4 Concluding Remarks 

The preceding exemplifying and hypothetical 
considerations suggest the intriguing possibility that (a) 
recurrent inhibition mediated via Renshaw cells is a central 
model of neurally generated muscle force production, (b) the 
parameters of this model (e.g., c and k) may be adjusted, both 
via descending commands (uj(t) in Fig . 2), to special task 
requirements, or, via proprioceptive feedback (in this case 
from group II spindle afferents, or other afferents, such as 
those from joints), to peripheral conditions. This is no 
unique occurrence since the stretch reflex dynamics (Weiss et 
al. 1986) and the relative neural activation of different 
muscles or muscle parts (Jongen et al. 1989) depend on joint 
angle (muscle length), as well. 

For lack of space - and of data - the cooperative 
interaction - in central neurones - of recurrent inhibition, 
yielding a prediction, of neurally generated muscle force, and 
of spindle afferents, yielding state feedback about two 
further determinants of muscle force (muscle fibre length and 
its change), could not been treated here at any length. This 
remains a task for the future. 




313 



4.1 The Degree of Match 

One of the central hypotheses of this paper is that 
recurrent inhibition and proprioceptive feedback provide a 
compound feedback to spinal networks signalling the basic 
determinants of muscle force production. A prerequisite 
appears to be a certain match between structural and 
functional characteristics of the two systems. An extensive 
comparison in this regard has been given elsewhere (Windhorst 
1988). The extent to which properties of the two systems need 
to be matched is generally not easy to determine because, 
firstly, recurrent inhibition and proprioceptive feedback 
monitor different parameters (la feedback having to mirror the 
complicated mechanical periphery) and, secondly, the match 
would be expected to depend on the specific muscle systems 
under study. But there are differences that need a few 
comments . 

1) The central connections of Renshaw cells and spindle 
la afferents onto motoneurones are similar, albeit not 
completely isomorphic (Baldissera et al . 1981; Hamm 1990). In 
the cat hindlimb, recurrent inhibition interconnects more 
motoneurone pools than do la afferents, which appears to be a 
counterargument to the required match. However: (a) Such a 
match cannot be expected to be completely isomorphic (see 
above), (b) The distribution of recurrent inhibition (from 
motor axons to motoneurones ) cannot be fairly compared with 
that of la afferent connections because the latter represent 
only the second part of the feedback loop, the first being the 
mechanical coupling between motor units and spindles. This is 
usually not taken into account in such comparisons, (c) Not 
all the interconnections may be active all the time (being 
subject to various modulating influences; see above). (d) 
Spindle feedback is also provided by group II afferents which 
have a wider (and more complicated) distribution than do la 
afferents (see Windhorst 1988). 

2) Recurrent inhibition is absent in some limb muscles, 
in particular in distal foot and hand muscles (human hand 
muscles: Person and Kozhina 1978; cat "hand" muscles: Hahne et 




314 



al. 1988; cat short plantar muscles: Cullheim and Kellerth 
1978a). Recurrent inhibition thus appears to be more prevalent 
in proximal limb muscles (Baldissera et al . 1981). The 
functional reason behind this may be that muscles around 
proximal joints have to move larger inertia than do distal 
muscles which simply move the digits. Force control to cope 
with inertial problems may thus be a major issue with which 
recurrent inhibition may be functionally associated. 

The connectivity of both spindle group II and Golgi 
tendon organ afferents is very wide and diffuse, including 
excitatory and inhibitory effects (e.g., McCrea 1986). This 
justifies to assume that secondary spindle endings (in 
addition) and Golgi tendon organs play a role in motor control 
differing from that played by recurrent inhibition and spindle 
la afferents, probably in providing muscle length and tension 
signals for the coordination of activities in many limb 
muscles. In particular, as exemplified for spindle group II 
afferents in Fig . 3, they could in turn adapt parameters in 
the above two systems. In other words, spindle group II and 
tendon organ afferents form a system distinct from that 
constituted by recurrent inhibition and la afferent feedback. 

4.2 Common Principles 

Several features put the present suggestion into a wider 
context, possibly pointing to common design principles in the 
nervous system. 

1 ) Skeletal force production is determined by a complex 
nonlinear interaction of various factors. It is here proposed 
that the interaction is decomposed into these (or some of 
these) factors by Renshaw cells and la afferents (with a 
possible contribution from group II spindle afferents). By 
convergence of Renshaw cell and la outputs on several spinal 
neural systems, the separate signals are re-composed r probably 
in a nonlinear fashion. Similarly, many sensory systems carry 
feature-specific information on separate channels which then, 
at some point, converge on integrative systems. 

2) This re-composition creates complex representations in 
multi-dimensional spaces. Populations of Renshaw cells monitor 




315 



the activity of many synergistic motoneurones in a way that 
apparently is related to their proximity and/or functional 
similarity and capacity of force production. The vectorial output 
of many Renshaw cells will then represent the expected vectorial 
force (or torque) output due to neural drive. Similarly, the 
vectorial output of spindle la afferents represents vectorial 
measures of muscle fibre length and velocity (which need not be 
exactly col linear with the recurrent vector) . The possibly 
nonlinear summation of these vectorial quantities in spinal 
post synaptic neurones will create a new mixed and more global 
representation (Hasan 1990) . Similarly, mixed vectorial quantities 
are also encoded by motor cortical cells in primates. Cells whose 
firing is related to shoulder and elbow movements display a 
broadly tuned discharge response to various directions of hand 
movement (Georgopoulos et al. 1986, 1989; Kalaska et al . 1989; 
Kalaska 1990) . That is, each neurone has a "preferred direction" 
with a large variation around it. The direction of hand movement 
can be fairly well predicted by a vectorial sum of a whole 
population of motor cortical cells (Georgopoulos et al. 1986, 
1989) . In addition to direction, many cells in cortical area 4 
also code the direction of the load to be moved. Apparently this 
load-sensitivity is characteristic of neurones related to muscles 
across the shoulder which control the trajectory of the reaching 
movement and thus have to cope with arm inertia (Kalaska 1990) . 



References 

1. Appelberg, B., Hulliger, M., Johansson, H., Sojka, P. : Actions 
on y -mot oneur ones elicited by electrical stimulation of group II 
muscle afferent fibres in the hind limb of the cat. J. Physiol. 
(Lond.) 335: 255-273 (1983) 

2. Bagust, J., Knott, S., Lewis, D.M., Luck, J.C., Westerman, 
R.A. : Isometric -contractions of motor units in a fast twitch 
muscle of the cat. J. Physiol. (Lond.) 231: 87-104 (1973) 

3. Baldissera, F., Hultborn, H., Illert, M. : Integration in spinal 
neuronal systems. In: The nervous system. Motor control (V.B. 
Brooks, ed.), pp. 509-595 (Handbook of physiology, Sect. 1, Vol . 
II) Bethesda: Amer. Physiol. Soc. 1981 




316 



4. Boyd, I. A.: The action of the three types of intrafusal 
fibre of isolated cat muscle spindles on the dynamic and 
length sensitivities of primary and secondary sensory endings . 
In: Muscle receptors and movement (A. Taylor, A. Prochazka, 
eds . ) , pp. 17-32. London : Macmillan 1981 

5. Brown, M.C., Goodwin, G.M., Matthews, P.B.C.: Aftereffects 
of fusimotor stimulation on the response of muscle spindle 
primary afferent endings. J. Physiol. (Lond.) 205: 677-694 
(1969) 

6. Burke, R.E.: Motor unit types: functional specializations 
in motor control. Trends Neurosci. 3: 255-258 (1980) 

7. Carlson, F.D.: Kinematic studies on mechanical properties 
of muscle. In: Tissue elasticity (J.W. Remington, ed.), pp. 
55-72. Washington DC: Amer. Physiol. Soc. 1957 

8. Cleveland, S.: Verarbeitung spinal-motorischer 
Ausgangssignale durch die Renshaw-Zellen. Univ. Diisseldorf: 
Med. Habil . -Schrift, 1980 

9. Cleveland, S., Kuschmierz, A., Ross, H.-G.: Static input- 
output relations in the spinal recurrent inhibitory pathway. 
Biol. Cybern. 40: 223-231 (1981) 

10. Cullheim, S., Kellerth, J.-O.: A morphological study of 
the axons and recurrent axon collaterals of cat a-motoneurones 
supplying different hind-limb muscles. J. Physiol. (Lond.) 
281:285-299 (1978a) 

11. Cullheim, S., Kellerth, J.-O.: A morphological study of 
the axons and recurrent axon collaterals of cat a-motoneurones 
supplying different types of muscle unit. J. Physiol. (Lond.) 
281: 301-313 (1978b) 

12. Demidville, H.N., Partridge, L.D.: Probability of 
peripheral interaction between motor units and implications 
for motor control. Amer. J. Physiol. 238 (Regulatory 
Integrative Comp. Physiol. 7): R119-R137 (1980) 

13. Evarts, E.V.: Sherrington's concept of proprioception. 
Trends Neurosci. 4: 44-46 (1981) 

14. Fitch, S., McComas, A.: Influence of human muscle length 
on fatigue. J. Physiol. (Lond.) 362: 205-213 (1985) 

15. Freund, H.-J.: Motor unit and muscle activity in voluntary 
motor control. Physiol. Rev. 63: 387-436 (1983) 

16. Fromm, C., Haase, J., Wolf, E.: Depression of the 
recurrent inhibition of extensor motoneurons by the action of 
group II afferents. Brain Res. 120: 459-468 (1977) 

17. Georgopoulos , A.P., Schwartz, A.B. , Kettner, R.E.: 
Neuronal population coding of movement direction. Science 233: 
1416-1419 (1986) 

18. Georgopoulos, A.P., Lurito, J.T. , Petrides, M. , Schwartz, 
A.B., Massey, J.T. : Mental rotation of the neuronal population 
vector. Science 243: 234-236 (1989) 

19. Gans, C., Bock, W.J.: The functional significance of 
muscle architecture - a theoretical analysis. Ergeb. Anat. 
38: 115-142 (1965) 

20. Gottlieb, G.L. , Agarwal, G.L.: Response to sudden torques 
about the ankle in man. II. Postmyotatic reactions. J. 
Neurophysiol. 43: 86-101 (1980) 

21. Gregory, J.E., Morgan, D.L., Proske, U. : Changes in size 
of the stretch reflex of cat and man attributed to 
aftereffects in muscle spindles. J. Neurophysiol. 58: 628-640 
(1987) 




317 



22. Hahne, M. , Illert, M. , Wietelmann, D.: Recurrent 
inhibition in the cat distal forelimb. Brain Res. 456: 188-192 
(1988) 

23. Hamm, T.M. : Recurrent inhibition to and from motoneurons 
innervating the flexor digitorum and flexor hallucis longus 
muscles of the cat. J. Neurophysiol. (1990) (in press) 

24. Hasan, Z.: A model of spindle afferent response to muscle 
stretch. J. Neurophysiol. 49: 989-1006 (1983) 

25. Hasan, Z.: Biomechanics and the study of multi joint 
movements. In: Freedom to move: dissolving boundaries in motor 
control (Humphrey, D.R., Freund, H.-J., eds ) . Dahlem 
Konferenzen. Chichester: John Wiley&Sons (1990) (in press) 

26. Hulliger, M. : The mammalian muscle spindle and its central 
control. Rev. Physiol. Biochem. Pharmacol. 101: 1-110 (1984) 

27. Hultborn, H., Pierrot-Deseilligny , E.: Changes in 
recurrent inhibition during voluntary soleus contractions in 
man studied by an H-reflex technique. J. Physiol. (Lond.) 297: 
229-251 (1979) 

28. Hultborn, H. , Lipski, J., Mackel, R. , Wigstrom, H. : 
Distribution of recurrent inhibition within a motor nucleus. 
I. Contribution from slow and fast motor units to the 
excitation of Renshaw cells. Acta Physiol. Scand. 134: 347-361 
(1988) 

29. Jongen, H.A.H., Denier van der Gon, J.J., Gielen, 
C.C.A.M.: Inhomogeneous activation of motoneurone pools as 
revealed by co-contraction of antagonistic human arm muscles . 
Exp. Brain Res. 75: 555-562 (1989) 

30. Joyce, G.C., Rack, P.M.H., Westbury, D.R.: The mechanical 
properties of cat soleus muscle during controlled lengthening 
and shortening movements. J. Physiol. (Lond.) 204: 461-474 
(1969) 

31. Kalaska, J.F.: What parameters of reaching are encoded by 
discharges of cortical cells? In: Freedom to move: dissolving 
boundaries in motor control (Humphrey, D.R., Freund, H.-J., 
eds.). Chichester: John Wiley&Sons (1990) (in press) 

32. Kalaska ,J.F., Cohen, D.A.D. , Hyde, M.L., Prud'homme, M. : 
A comparison of movement direction-related activity versus 
load direction-related activity in primate motor cortex, using 
a two-dimensional reaching task. J. Neurosci. 9: 2080-2102 
(1989) 

33. Marsh, E., Sale, D. , McComas, A.J., Quinlan: Influence of 
joint position on ankle dorsiflexion in humans. J. Appl . 
Physiol. 51: 160-167 (1981) 

34. Matthews, P.B.C.: Muscle spindles: their messages and 
their fusimotor supply. In: The nervous system (V.B. Brooks, 
ed.), Handbook of physiology, Vol . II, Part 1, pp. 189-228. 
Bethesda: Amer. Physiol. Soc . (1981) 

35. McCrea, D.A.: Spinal cord circuitry and motor reflexes. 
Exercise Sport Sci . Rev. 14: 105-141 (1986) 

36. McMahon, T.A. : Muscles, reflexes, and locomotion. 
Princeton: Princeton University Press 1984 

37. Meyer-Lohmann , J., Henatsch, H.-D., Benecke, R. , Hellweg, 
C.: Muscle stretch and chemical muscle spindle excitation: 
effects on Renshaw cells and efficiency of recurrent 
inhibition. In: Understanding the stretch reflex (S. Homma, 
ed.), Progr. Brain Res., Vol. 44, pp. 223-233. Amsterdam- 
Oxford-New York: Elsevier (1976) 




318 



38. Meyer-Lohmann , J., Riebold, W. , Robrecht, D. : Mechanical 
influence of the extrafusal muscle on the static behaviour of 
deef f erented primary muscle spindle endings in cat. Pfliigers 
Arch. 352: 267-278 (1974) 

39. Niemann, U. , Windhorst, U. , Meyer-Lohmann, J.: Linear and 
nonlinear effects in the interactions of motor units and 
muscle spindle afferents. Exp. Brain Res. 63: 639-649 (1986) 

40. Noth, J., Thilmann, A.: Autogenetic excitation of extensor 
-motoneurones by group II muscle afferents in the cat. 
Neurosci. Lett. 17: 23-26 (1980) 

41. Partridge, L.D., Benton, L.A. : Muscle, the motor. In: The 
nervous system (V.B Brooks, ed.), Handbook of physiology, Vol. 
II, Part 1, pp. 43-106. Bethesda: Amer. Physiol. Soc. 1981 

42. Person, R.S., Kozhinam G.V. : Study of orthodromic and 
antidromic effects of nerve stimulation on single motoneurones 
of human hand muscles. EMG Clin. Neurophysiol. 18: 437-456 
(1978) 

43. Rack, P.M.H., Westbury, D.R. : The effects of length and 
stimulus rate on tension in the isometric cat soleus muscle. 
J. Physiol. (Lond.) 204: 443-460 (1969) 

44. Sacks, R.D., Roy, R.R. : Architecture of the hind limb 
muscles of cats: functional significance. J. Morphol. 173: 
185-195 (1982) 

45. Sherrington, C.S.: On the proprio-ceptive system, 
especially in its reflex aspects. Brain 29: 467-482 (1906) 

46. Talbot, S.A., Gessner, U. : Systems physiology. New York- 
London-Sidney-Toronto: Wiley 1973 

47. Weiss, P.L., Kearney, R.E., Hunter, I.W.: Position 
dependence of stretch reflex dynamics at the human ankle. Exp. 
Brain Res. 63: 49-59 (1986) 

48. Windhorst, U. : How brain-like is the spinal cord? 
Interacting cell assemblies in the nervous system. Berlin- 
Heidelberg-New York-London-Paris-Tokyo : Springer-Verlag 1988 

49. Windhorst, U. : Do Renshaw cells tell spinal neurones how 
to interpret muscle spindle signals? In: Afferent control of 
posture and locomotion (J.H.J. Allum, M. Hulliger, eds . ) , 
Progress in Brain Research, Vol. 55. Amsterdam: Elsevier (1990) 
(in press) 

50. Windhorst, U. , Hamm, T.M., Stuart, D.G. : On the function 
of muscle and reflex partitioning. Beh. Brain Sci. (1989) (in 
press ) 

51. Windhorst, U. , Schmidt, J., Meyer-Lohmann, J. : Analysis of 
the dynamic responses of deef f erented primary muscle spindle 
endings to ramp stretch. Pfliigers Arch. 366: 233-240 ( 1976 ) 

52. Woittiez, R.D., Huijing, P.A., Rozendal, R.H. : Influence 
of muscle architecture on the length-force diagram of 
mammalian muscle. Pfliigers Arch. 399 : 275-279 (1983) 




Analogic Models for Robot Programming 



R.Zaccaria, P.Morasso and G. Vercelli 

DIST - University of Genoa - Via Opera Pia, 1 1A - 16145 Genova, Italy 



Abstract. Planning robot movements in an unstructured environment is investigated from the 
point of view of common sense reasoning as the simulation of a complex dynamical system. 
We first show how motor redundancy can be solved by simulating passive motion (what we 
call Passive Motion Paradigm). We then extend this concept in terms of Abstract Force 
Fields , that allow expression of a variety of problems in addition to trajectory formation 
(force/position control, obstacle avoidance, concurrent tasks, etc.). Furthermore, we define the 
notion of Hybrid Motor Schema as the composite ensemble of an analogic component (which 
consists of the ’’mental” simulation of model dynamics) and a symbolic component (which 
contains rules and methods for instantiating the analogic component in a given context: goal, 
environment, expectations, constraints, ..). The paper then describes in more detail two 
specific analogic models in motion planning. The first is a connectionist formulation of motor 
planning for a redundant anthropomorphic structure (M - nets). The second is a general 
framework for planning sequences of actions for assembling objects. Simulation results of 
preliminary implementations are presented. 

1. Introduction 

Planning robot movements in an uncertain, unstructured environment is a typical problem of 
dealing with the real world. This kind of planning cannot be reduced to trajectory formation 
only; it also needs to deal with redundancy, complex 3D modelling, and knowledge-based 
representation. These are all aspects of the same problem, in the sense that the complexity of 
planning should be analyzed in a global manner, and then distributed to a set of concurrent 
computational models. 

The present paper reports a line of research that approaches this problem taking into 
account concepts of common sense reasoning. Common sense reasoning is inspired by some 
hypotheses (Steels, 1988): 

a. models are just approximations of the real world; 

b. reality requires multiple-level representations; 

c. complexity involves both symbolic and analogic computations; 

d. problems are often "ill-posed", and the solution space is practically infinite and 
constraints tend to be implicit; 

e. world knowledge is rarely complete; 

f. non-linear dynamics is a powerful problem solving tool. 

Summing up, we can single out two components in a common sense system: 

1. An analogic component, which consists of the "mental" simulation of the dynamics of a 
model which has some degree of analogy with a relevant chunk of the real world; 




320 



2. A symbolic component, which contains rules and methods for instantiating the analogic 
component in a given context (goal, environment, expectations, constraints, 

The ensemble of the two components is something which obviously must have some affinity 
with the notion of schema/frame/script/.., the peculiarity being given by the analogic 
component: therefore, we wish to call it hybrid schema. The rest of the paper is devoted to 
explain what we mean by this in the specific domain of robot planning and control (P&C), 
while it also shows some preliminary simulation results. 

The general concepts are introduced in sections 2 to 4. In section 5 some examples of 
robot motions are discussed. In section 6 a specific case of a neural model for a redundant 
robot is described. In section 7, a framework for using analogic models in planning sequences 
of actions is presented. 

2. Redundancy and the Passive Motion Paradigm 

Redundancy is one of the key aspects to view when studying the complexity of motor P&C 
for both the biological and artificial cases. Redundant systems are characterized by an excess 
of degrees of freedom with respect to the number of parameters required for specifying a task. 
For example, only six kinematic parameters are needed to express the location of a rigid tool 
in the workspace. In contrast, the degrees of freedom of the human arm are at least seven not 
including the fingers and the shoulder-girdle complex. 

Redundancy of motor systems introduces some complex mathematical problems since the 
kinematic equations do not admit unique well-defined inverse solutions. However, the effort 
of facing these difficulties is paid back by increased flexibility: motor redundancy endows the 
motor planner with a rich repertoire of coordinated strategies (instead of a unique solution) for 
any single kinematic task. In a sense, redundancy appears to be the key to versatility of human 
movements and for this reason it attracts an increasing interest in robotics research. 

Several investigators of biological systems have focussed on the role of muscle 
mechanical properties in motor control, suggesting that muscles are mechanically analogous to 
"tunable" springs: i.e., they are characterized by a set of integrable functions between length 
and tension at steady-state. Neural input to each muscle selects a particular function out of 
this set. The equilibrium position and the stiffness of a joint is then defined, for any given 
setting of muscle activations, as the position at which the length-dependent forces of opposing 
muscles generates equal and opposite torque about the joint. Postures are equilibrium 
configurations that "attract" different body segments. This view of posture has been more 
recently extended to the analysis of movement and trajectory formation: as the distribution of 
neuromuscular activity varies in time, a sequence of attractors is defined (virtual positions) 
therefore generating a movement and/or modulating a contact force. This approach to motor 
control is readily applicable to artificial systems by characterizing the actuators (together with 
their local controllers) as elastic elements with selectable stiffness (or compliance) properties. 

In previous papers (Marino et al 1985, 1986, Morasso and Mussa Ivaldi 1986, Morasso et 
al 1987, 1988, Mussa Ivaldi et al 1988) we have shown that muscle elastic properties lead to a 
natural representation and regularization of redundancy. In essence, redundancy resolution 
derives from the simulation of an externally imposed motion (for example a displacement of 
the end-effector) which represents the selected task. Given such a perturbation, a physical 




321 



system characterized by elastic actuators will settle to a well defined configuration which 
corresponds to a minimum of potential energy compatibly with the externally-imposed 
motion. The generation of a full trajectory is derived by iterating the above process with small 
(infinitesimal) position changes along the desired path. At the end of each displacement, the 
control inputs are reset so as to maintain the system at equilibrium: This regularization 
process has been called Passive Motion Paradigm (PMP). In contrast with conventional 
approaches to redundancy (e.g. the application of standard jacobian pseudo-inverses or the 
optimization of motions in the null-space) PMP leads to a class of integrable local solutions 
which can be adapted to the specific context by the appropriate choice of actuator elastic 
properties. 

3. Abstract Force Fields 

According to the principle illustrated above, planning a human movement, i.e. generating the 
synergy of muscle activations, is an internal simulation of passive movement, or a motor 
"relaxation" between several different postures. 

However, the power of this principle is that we can abstract from the specific physical 
nature of the underlying process (the viscous elastic properties of the actuators). Even if the 
actuators do not have these properties, we can still attribute a virtual compliance to them, 
which can have a natural interpretation in any specific planning circumstance. 

The general concept is that we can solve motor redundancy by means of a multi- 
dimensional force field which can model directly a physical substrate or can be an abstract 
representation: an abstract force field (AFF). 

From the motor P&C point of view, attractor postures are representations that can be 
programmed incrementally (with respect to the current posture) for obtaining two 
complementary results in the configuration space. The generation of prescribed trajectories (if 
the motion is free) or of prescribed interaction forces (if the intended movement is inhibited 
by external constraints). The relationship between the two entities is given by the stiffness 
matrix : 



x = K&q 



where $q represents a small displacement from an attractor posture, x is the corresponding 
restoring force/torque vector and K is the stiffness matrix. The generality of this formulation 
is that we can build upon it different representational layers that capture diverse aspects of 
motor P&C in the same framework, by superimposing (in space and time) several AFFs. For 
example, the constraints due to joint limits can be easily taken into account by an AFF where 
repulsive forces steeply increase the potential function in the neighborhood of forbidden joint 
configuration regions. The problems that plague force control (e.g. instability) are overcome 
because interaction forces are not controlled explicitly but arise implicitly from the intrinsic 
controllable stiffness/compliance of the system. 

Other P&C issues are better formulated in 3D space, for example trajectory formation or 
obstacle avoidance, and this implies a mapping of the AFF between configuration space and 
3D space. This mapping must satisfy the principle of invariance of virtual work: 




322 



bq-x = bx-f 



where 6 q and bx are virtual displacements in the two spaces and x and / are the corresponding 
force vectors. From this it is possible, in principle, to determine the passive behaviour of the 
system when a disturbing force / or a passive displacement bx is applied: bq identifies indeed 
the resulting dislocation from the attractor posture and x is the corresponding restoring force 
vector. 

The power of the method is threefold: We reduce free movements and interaction 
movements to the same paradigm, we overcome the redundancy issue (planned trajectories are 
guided in the direction of minimum potential energy), and we open the window to a variety of 
generalizations of the </ 1 5jc> representation. At the basic level, </ 1 5* > specifies an attractor 
that implements the inverse kinematic problem of a redundant kinematic chain. More general 
is the case of a whole humanoid, composed of a set of (possibly closed) kinematic chains: This 
requires to segment the </ 1 bx > representation in order to specify an attractor for each limb of 
the humanoid. Moreover, obstacle avoidance can be expressed as a repulsive force field 
emanating from the obstacles, overlapped with the attracting field associated with the target. 

The technique of overlapping multiple AFF’s (some of them intended to attract and some 
to repel the motion of the robot) can be extended to the time axis, allowing the smooth 
composition of complex trajectories driven by discrete instantiation of guiding postures/spatial 
patterns, in a similar way to the segmentation of continuous speech in overlapped phonemes or 
of cursive script handwriting in overlapped graphemes. Furthermore, the technique can be 
used for both off-line planning and on-line correction. 

Common Sense provides a conceptual methodology. Complex Dynamics is better handled 
if data are explicitly divided from processes, i.e. equations. In this sense we can draw an 
analogy between the Abstract Data concept in Software Engineering and the AFF concept 
considered in this paper. 

4. Hybrid Motor Schemas 

Hybrid motor schemas (HS) are intended to capture the complexity of actions in the real 
world. The term ’’hybrid" refers to the integration of Logic/Symbolic and Analogic/Simulative 
representations. Let us consider "grasping", for example, performed by a dexterous hand. Its 
structure can be detailed from different points of view: 

1 . There are different grasping modes; 

2. The shape of objects and their physical nature provide several virtual "handles"; 

3. The layout of the working environment provides constraints on the approach phase and 
the selection phase. 

Grasping modes are general strategies for grasping, regardless to the particular points 
("handles") on which the contact will be realized: two-fingers, palm, cylindric, pencil-type and 
so on. 

Handles are the possible known points for stable grasping, defining families related to 
grasping modes by higher level semantic functions. 




323 



Finally, grasp planning must put into relation the actual position, the position/orientation 
and size of the object, the selected grasping mode and handles, the working environment and 
its constraints, giving as an output a trajectory in the n-dimensional space of the joints. 

A grasping schema needs analogic representations for the different grasping modes, 
parameterized in such a way to adapt them to the specific dimensions of the handles. It also 
requires rules for selecting grasping modes and parameters according to the context, and 
procedural attachments for activating the motor tasks. 

While the purpose of activating schemas is usually that of directly producing inferences, 
the purpose of a hybrid motor schema, such as the grasping schema, is to set up an AFF whose 
simulation details the plan of movement. The coordination among all the fingers as a function 
of the specific shape of the grasped object, for example, is not explicitly represented in the 
schema but is an indirect result of simulating the complex dynamics of the force field. The 
different phases of the planned actions (e.g. approach and grasp) can be joined together by 
singling out a limited set of "key postures" (Morasso, Vercelli, Zaccaria, 1988). In summary, a 
hybrid motor schema is a representation of the whole movement or a particular chunk of it 
which is conceptually divided into two parts: 

• an analogic part , i.e. kinematic and/or dynamic and/or eidetic descriptions; 

• a symbolic part , i.e. micro and/or macro features that specify something of the movement. 

The analogic part of the Grasping Schema is basically composed of two parts: the 3D-iconic 
representation of the scene (start and end positions of the hand, obstacles), and processes: 
those which adapt the grasping mode to the particular object, those which set up the AFFs 
connected to target and obstacles, and the AFFs that drive the kinematics of the whole 
structure. 

Differently from classic planning systems, the behavior of the system is the result of many 
cooperating processes. In particular, the time is not explicitly described in the HS. The 
temporal performance is "observed" while the grasping action is being carried out, driven by 
the composition of the AFFs. However, the temporal behavior can be changed by selecting 
appropriate values, for the fields strengths, for the stiffness matrices, and so on. 

A more detailed analogic model for a redundant anthropomorphic robot based on PMP and 
connectionist concepts will be presented in section 6. 

5. Using AFFs and HSs: Some Examples 

Translating the AFF and HS concepts into practical computational terms is our long-term goal. 
We have just begun to setup a few tiles of the mosaic, developing software tools for various 
simulation purposes. One such tool is a prototype of a human grasping movements planner, 
called OCTOPUS. This system is able to solve reaching- grasping problems, starting from a 
set of icons (3-D kinematic skeletons) of the hand and from a hybrid schema of the action. 

Icons are organized as a Knowledge Base. They represent the analogic part of the planner. 
The planner itself performs symbolic computation, or reasoning , on icons’ properties, also 
described in the KB. 




324 




Figure 1 - arm reaching task , initial and final positions 

An icon is usually a virtual target for a part of a movement, as a final or intermediate 
posture; in some case, an icon can be a repulsing posture, having the meaning of a forbidden 
(e.g. dangerous) configuration. Several icons may be (and usually are) simultaneously targets, 
so that no one will be exactly part of the actual motion. The aim of the planner is to: 

1. choose the proper icons (related to the the particular object, and the given grasping 
type), e.g. a suitable grasping posture ; 

2. adapt the icons to the actual problem (changing parameters so that they correspond to 
the shapes of objects); e.g. the grasping posture is made fit to the object; 

3. set the appropriate Force Fields associated to the set of icons (which may include 
obstacles, constraints etc.). 

The planner carries out its task on the symbolic "summaries" of the icons, also stored in the 
KB. After the planner has decided and set up the analogic model for the action, a realizing 
process starts, which generates the movement. This may be done using the P&C paradigm 
described in the previous section. Currently, computation is carried out by procedures written 
in NEM language. The NEM language (Marino et al, 1985,1986) allows building arbitrary 
structures of geometric frames, to compute geometric relations among them, and to specify 
relative motions in a concurrent way. The KB, in this example, is a multiple inheritance 
semantic network of frames implemented in Lisp; frames are either physical objects or 
processes . An action (in our case, a grasping action) is a compound process which connects 
the planner to the actual object, computes the attracting icons (e.g., the "handles" of the 
object), then sets up the AFFs and supervises the different phases. Analogic procedures are 
procedural attachments to concepts in the hybrid KB (Adomi et al. 1988). Figures 1-2 are 
simulation outputs of the planner: the reaching task is performed by an inverse kinematic 
module that is the HS "arm reaching task", while grasping task is performed by an 
interpolation module that is the HS "handle grasping task". 





Figure 2 - handle grasping task, actions for different objects 

For the organization of such HS’s we started building a tool for body models , that encodes 
the special geometric constraints of degrees of freedom, a tool for compliant actuators , that 
sets up the graph implementing the attractor posture, and a tool for obstacle modelling that 
implements the repulsive force fields as AFF. The figures show some examples of simulations 
performed within such a software environment. Figure 3a is an example of inverse kinematics 
for a redundant arm (open kinematic chain). Figure 3b extends the example to the parallel 
movement of two arms towards to independent targets. Figure 3c shows the generation of a 
whole body movement from two subsequent attractor postures. 

The examples in figures 4 and 5 lie outside the anthropomoiphic scenario. They show the 
expressive power of the AFFs approach, regardless of the type of the robot, when dealing with 
an inherently redundant / partially undefined task. The problem of figure 4 is a pick-and-place 
task with obstacle avoidance, using a 5-d.o.f (not including gripper) robot arm. Grasping 
modes are stored in the HS, and they are attractor icons together with the final ("place") 
position, whereas obstacles are repulsive icons. The whole movement is a simultaneous 
evolution of all d.o.f. in a global AFF set by the planner. 




326 




Figure 3 a,b,c - Movements generated by HSs and AFFs: 
a) redundant arm, b) independent targets, c) via attractor postures 

The final problem is planning trajectories for a mobile, conventional robot for 
accomplishing continuous arc welding tasks. Such task involves 5 d.o.f., whereas the robot has 
6 d.o.f., and additional d.o.f. are typically provided in some other way. Figure 5, for example, 
shows two phases of a common task in which two small pipes are to be welded to a bigger one 
(horizontal); the big pipe rotates along its axis and the robot can move along a line parallel to 
that axis, hence 8 d.o.f. are available. All d.o.f. must be used to reach the object to be welded, 
and to follow a "big” trajectory (in our example, the intersection of two pipes of relevant 
dimensions) avoiding, at the same time, collisions and illegal postures, constrained by the 
welding hardware. 

This problem has been solved for a real robot using an AFF technique (Casalino et al. 

1988) similar to that described above, in which the relevant icons were guiding the robot 
movement among the different phases, and "pushing away" the arm from illegal or colliding 
postures. 

6. M-nets 

A modelling framework, called "Motor Relaxation Networks" (shortly, M-nets), has been 
formulated to express motor planning for the biological motor control system (Morasso et al., 

1989) . This method, which is based on Passive Motion Paradigm (Mussa Ivaldi et al., 1988), 
can be formulated in connectionist terms as a multiple constraints satisfaction network, driven 
by a potential function, similarly to Hopfield networks (Hopfield 1982) because it is possible 
to define a global function E, which has the property of a potential function. In Hopfield nets 
E is a purely abstract computational concept; in M-nets, on the contrary, E is the elastic energy 
stored by the muscles, which is the direct consequence of a physical phenomenon of biological 
significance. In biological motor control, a sufficiently realistic M-net may be used as a tool to 
formulate predictions of the structure of complex motor synergies. Therefore, M-nets are a 
kind of analogical model in a double sense: because the computational process is a dynamic 
system and because there is a direct analogy between the structure of the model and the 
underlying physical world. 





327 




Figure 4 - Top view of a pick-and-place movement via AFFs 
Three types of units are present in M-nets: 

- skeletal segment units (s-units), 

- muscle units (m-units), 

- ligament units (1-units). 

Let us now consider concisely their structure and function: 

1. S-units model the different skeletal body segments, represented as rigid bodies to which 
complex sets of forces (muscle forces, ligament forces, external forces) are applied. 




Figure 5 - An 8 d.o.f. task: continuous arc welding of big pipes 

These units are more complex than those usually considered in connectionist models 
because they deal with vector quantities (force, velocity, and rotation vectors) and 
perform vector operations such as vector products. The correct analogy, for an s-unit, is 
with a whole cortical column, not with a neuron. 

The input is a set of force vectors (/’i,/ 2 ,../„) applied to a set of insertion points 
(P 1 .P 2 .-A.) distributed over the skeletal segment. 

The outputs are the positions of the same insertion points. 

The transformation operated by an s-unit is characterized by the following steps: 

- compute the resultant force and torque vectors (F and N), 

i.e. the net vectorial input; 

- compute incremental displacement and rotation vectors as 

sigmoidal functions of F and N, respectively; 

- update the insertion points with a rigid translation/rotation. 

2. M-units model the muscles as viscous-elastic elements characterized by a family of 
curves 



/=/<W;a) 

indexed by the level of muscle activity a (f is the muscle force, l is the muscle length and 
/ is the speed of contraction) 1 . M-units are connected to two s-units in the case of 
single-joint muscles and to several s-units in the case of multi-joint muscles (e.g. many 
muscles of the hand). The inputs are the insertion points of the muscle, from which the 
muscle length is computed. Two kinds of outputs are present: one is muscle force, which 
is transmitted to the connected s-units, the other is muscle activation a that represents 
the motor output and should be transmitted to the muscles. During the simulation of 
passive movements, muscle lengthening/shortening is accompanied by variations of a 
that keep the force output approximately constant. 

3. L-units model the joints as elastic connections with high stiffness. Complex joints can 
be modelled with multiple 1-units, even joints distributed over several skeletal segments, 
such as the wrist joint. Another function of 1-units is to model joint limits in a soft way, 
by generating increasingly stronger restoring forces as joints approach forbidden 
configurations. Note that joints and joint limits are not represented as hard constraints 



1. We used different kinds of models for / (), derived from the literature (Hatze, 1981). 




329 



but as soft constraints. The power of this concept is that it brings geometry, kinematics, 
and actuator dynamics under a uniform representational scheme. From the operational 
point of view, 1-units are simplified versions of m-units, without the mechanism of 
modification of a. L-units and m-units can store elastic potential energy. 

M-units and 1-units behave as impedances, i.e. they receive positional information and react 
feeding back force information. S-units, on the contrary, behave as admittances, i.e. they 
receive force information and react modifying positional parameters. In relation with usual 
connectionist models, m-units and 1-units are analogous to connection weights, these weights 
having the meaning of stiffnesses: fixed in the case of ligaments and variable in the case of 
muscles. 

A specialized s-unit is the "environment" (or "ground"). Gravity is represented by fictitious 
1-units that link to ground all the regular s-units. Environmental constraints that do not allow 
the movements of some skeletal segments also can be represented by means of fictitious 1- 
units as well as motor tasks, which can be specified by means of fictitious 1-units that link 
goal-oriented s-units (e.g. a distal phalanx in a reaching task) to the environment. Concurrent 
tasks are represented in a natural way by activating simultaneously multiple fictitious 1-units. 

An M-net is an asynchronous network. At equilibrium, the set of forces that enter each s- 
unit give resultant force and torque vectors which must be null: as a consequence, the 
positions of the insertion points of all the muscles and ligaments remain stationary. However, 
even when the network is at equilibrium, muscles and ligaments may well be far from 
equilibrium and may in fact store considerable amounts of elastic energy. 

M-nets solve the inverse kinematic problem both at the joint level and at the muscle level, 
irrespectively of the degree of redundancy. On the contrary, most of the methods developed in 
robotics for dealing with redundant systems (explicit optimization and/or motions in the null- 
space) are only practical for simple kinematic chains and are not scalable up to the degree of 
complexity of the human musculo-skeletal system. Moreover, M-nets do not solve inverse 
kinematics only. The solution, which consists of the parallel modulation of elastic actuators, 
has built-in compensatory characteristics that may counteract the forces due to dynamics. At 
least for simple arm movements (Flash 1987) it has been shown that this kind of dynamics 
compensation mechanism is sufficient to explain the experimentally recorded deviations from 
the straight line. 

We think that the major source of flexibility of M-nets with respect to the other 
connectionist models of motor control (Bullock and Grossberg 1988, Jordan 1988, Kawato et 
al 1987, Kuperstein 1987) is that M-nets explicitly use the viscous-elastic properties of 
muscles. A unit in M-nets plays the role of an entire cortical column in motor cortex, and the 
network is the modelling framework to express motor planning and control of highly 
redundant robots, actuated by means of muscle-like devices. 

A preliminary version of an M-net simulator was implemented by modifying a motor 
simulation and animation system - NEM - previously developed (Marino, et al., 1985). The 
simulator allows the definition of arbitrary complex musculo-skeletal systems in three 
dimensions, and a special graphic interface allows 3D animation of the simulated tasks on a 
graphic work-station. The output of a simulated movement for an anthropomorphic finger is 
shown in figure 6. 




330 




Figure 6 - Movement of an anthropomorphic finger using M-nets 

Although M-nets for complex musculo-skeletal structures run presently far from real-time 
conditions, they could quickly gain from modem parallel computing techniques. In particular, 
implementation on a transputer architecture seems to be feasible and natural and this is 
actually our next research target. 




331 



7. Analogic Planning of Action Sequences 

So far, we have been concerned basically with generation of laws of movements (AFFs, HSs), 
and on following trajectories with complex kinematic chains (PMP). Now let us face the 
problem of finding the temporal relationships between motions for solving a goal. In other 
words, the intrinsic constraints of a task must be detected, giving the sequence of events 
necessary to accomplish it. This is the classic field of application of Planning in A.I. 

In this section some new concepts about planning will be outlined. We start from the ideas 
introduced above, namely, the definition of AFFs, HSs, and PMP. Our aim is to show how 
action planning may be formulated in terms of movements inside Force Fields , producing an 
analogic or hybrid model. 

According to the PMP concept, we can realistically restrict ourselves to the formation of 
trajectories for each part to be handled, as if they could fly in space. Once the "flight 
trajectory" of each part is determined, together with their time order, a hand can be passively 
guided by it to solve the inverse kinematic/dynamic problem using PMP. 

A 2-D Block World problem will be used as an example of a real (more complex!) 
assembly problem.. However, our world has some "unusual" (for classic A.I. formulations) 
additional features, like shape , stable postures , and concurrency. Such features are the source 
of complexity in the purely symbolic approach, whereas they are the main source of 
information in the analogic scheme. 

Let us consider a set of parts that must be assembled into a whole. In fig. 7a a cube, an 
open box and a cover are randomly placed on a table. It is in general possible that some piece 
prevents the movements of some other, as in our case, where the cover is in origin placed on 
top of the cube. 

In general, the final structure of the assembly is described by a set of phrases in a suitable 
language, at the symbolic level. Such language is based on formulas that define the geometric 
relationships between (at least) couples of parts. Each formula is a relation. Besides the purely 
geometric quantities there must be some other attributes of the relation which specify the 
semantics of it: typically, if it is fixed or free, or, in general, the degrees of freedom left and 
their dynamics. 

In A.I. and Robotics languages for specifying assemblies have been investigated since 
several years (see for example (Lieberman and Wesley 1977)); however, they are used in a 
purely logic/symbolic paradigm. In the example of figure 7, a general relation specifying the 
final position of the cube (fig. 7c) may look like 

rel cubel.face3 on box.face2 <transf> free 

rel box.face2 on table <transf> fixed 

where transf is the geometric relationship between the intrinsic reference frames of 
cubel.face3 and box.face2. A similar relation can be written for the cover and the box: 

rel cover. face2 on box.face4 <transf> free 
where, for example, box.face4 is the small top of the "left arm" of the U-shaped box. 




332 



T U □ : □ 

Isl ■ U ’Ll 

l£l ill □ 



Figure 7 a-i - Analogic planning of the cube-in-the-box problem 

Many human - oriented instruction sheets for assembly communicate the reader a 
comparable information. The table in figure 8, showing an "exploded" vacuum cleaner, shows 
basically local relations, leaving the reader the task of deciding what to do first. A model of 
this complexity should be a good testbed for the algorithms shown below. Note, by the way, 
the intuitive evidence of how the concept of "exploding the assembly" gives information about 
the sub-actions sequence (this concept will be used later on). 




333 




Figure 8 - A typical human-level assembly sheet 



7.1 Measurements vs. Inferences 

In general, relations define local information about the final assembly. For example, the 
relation between the cube and the cover, say it Rev, is not explicitly present in the previous set 
of relations. The traditional approach is to define a formal logic model and to prove theorems 
inside it. So, the relation Rev can be deduced by the set of assertions and inference rules. 
Another possible approach is to build an analog model (Funt 1983, Johnson-Laird 1983) of 
the final assembly, and to perform measurements on it. Such model may be physically 





334 



analogic (for example, a 3-D memory), or some kind of simulation structure. Once done, all 
global properties of the object can be discovered by direct measurements (distances, relations 
in general) on the "mental model". There are some important relations, from the point of view 
of planning, that must be found: they will be described in next sections. 

Similar relations must be defined for the non movable parts of the environment. We omit, 
for sake of briefness, a discussion on how to define shapes , assuming available some general 
formal language like in (Popplestone and Ambler 1978). 

7.2 Building the Mental Model 

This is the first phase of planning. It consists of putting together all the partial relations and of 
building the whole assembly model; generally speaking, it contains more explicit information 
than the original set of partial relations. 

Starting from the initial position (which can be described by a similar set of relations), the 
final model can be analogically built with a relaxation model, a sort of spatial Hopfield 
network, as follows: 

1. For each couple of pieces inside a relation, connect them by a spring (see below). 

2. Make each piece generate a repulsive force field to every other piece. 

3. Consider each piece as movable, subject only to the forces generated by springs and by 
repulsions. 

4. Let all pieces move simultaneously in space, with dynamics suitably defined so that 
viscous parameters dominate the motion’s law, hence avoiding oscillatory movements. 

5. When a local target relation is reached, turn on other external forces defined in the 
environment (e.g., gravity) acting on the involved pieces. 

6. Monitor continuously the local and global potential energy stored in the springs, that 
should reduce to zero once all relations are satisfied (by the way, the model can 
converge also in presence of non consistent sets of relations, by simply not reducing to 
zero the energy). 

7. Start a backtracking action when some local minimum is reached; local minima are 
equilibrium positions, or deadlocks, in which two or more pieces want to move toward 
their goals, disturbing each other. Two examples are: i) while navigating toward a 
direction, you enter into a concavity of an obstacle or of group of obstacles, ii) while the 
cube tries to enter into the box, the cover is already in its final position. Deadlocks 
require to climb back the gradient of the global energy field. 

8. Start a Backtracking Action when some non stable situation occurs: e.g., a piece falls 
down due to external forces. Consider it a deadlock too. 

9. Mark the deadlocks for the successive planning phase. 

This phase can also be considered as the analogic model of an assembly phase with an infinite 
number of hands available. 




335 



72.1 Springs 

Springs are defined so that a roto-translatory force is generated by a law of the form (for the 
planar case) 

J F = K t Al 
[M = K a AQ 

F and M are the force and torque generated by the misplacing of the piece with respect to its 
attracting posture. In practice, a saturation on the force/length curve is required in order to 
avoid too high accelerations at the starting positions. 

7.2.2 Deadlocks 

The possible deadlock situations are important. They are intrinsic sources of partial ordering 
among the actions , and the planner must recognize them. We may use the term rendez-vous 
(RV) to denote the instant in which two parts meet to form a subassembly. 

There are four basic situations. 

1. Dead end . A piece, navigating toward a target, is in a local minimum of the field. This 
does not involve a different partial ordering per se, but can be solved introducing a 
(dummy) obstacle in the point where the deadlock occurred. 

2. Intersecting Work Spaces. A subassembly lies one half on one side of another 
subassembly, one half on the other side. An example is a bolt that crosses a plane 
surface, with a nut on the other side: the RV of the subassembly bolt/nut must be 
delayed after the RV between the bolt and the surface. 

3. Box. An opened cavity (possibly infinite) cannot be closed until the contained 
subassembly reaches its minimum energy (see figs. 7b and 7g-h-i). 

4. Instability. The external forces disassemble an already obtained subassembly. For 
example, the top of a table cannot be put on only two legs; such RV must be delayed. 

Discovering deadlocks can be seen as a form of temporal measurement (the analogic 
counterpart of temporal reasoning ) on the analog model. Many of them can be checked in the 
first phase, possibly by executing the process several times with random laws of motion. Other 
deadlocks must be discovered in phase two. 

7.3 Disassembling for Ordering Actions 

The mental model of the assembly can now be disassembled (exploded) by: 

1. eliminating the springs, 

2. inserting new springs between pieces and their starting points 

3. making each object repulse each other. 

Disassembling an object is a form of measurement that gives information on partial orderings. 
By monitoring the relative speeds of pieces leaving their positions, it is possible to devise 
which piece can move first; pieces free to move first during the explosion are candidate for 
being the last parts to be joined together. For example (fig. 7d), the cover goes away rapidly, 
while the box is slower, indicating that the RV cover-box must follow the RV cube-box. A 
similar situation occurs for the bolt-plane-nut, whereas a different technique must be applied 
to the "instability" case. 




336 



If the assembly is composed by many parts, the explosion phase can be iterated for the 
inner parts to get the desired resolution in the monitoring process. 

By disassembling the object, a List of Partial Orderings (POL) can be built. 

7.4 Third Phase 

The third and last phase involves checking the correctness of the POL with respect to the 
initial situation. It m^y happen, in facts, that moving the piece PI before P2 generates a 
deadlock; for example, if PI is supporting P2. This can be checked by the following 
procedure: 

1. select each piece to be moved, one at the time, following the POL; 

2. connect it by a spring with its final position; 

3. let all the other pieces be movable obstacles; 

4. let the selected piece go to its target place; 

5. monitor the movements of the others, finding relevant deadlocks; 

6. for every deadlock, generate a movement able to avoid the deadlock, and put it into the 
POL before the selected action. 

For example, the cube is under the cover. By moving the cube, the cover is pushed away; this 
requires displacing the cover to some free position, or to grasp it (if we have a free hand to 
keep it while another hand acts on the cube). In general, in this phase the original setup of 
parts is considered an assembly SO that must be disassembled to compose a new one, SI. If not 
every piece is free to go from SO to SI following the POL, SO must be (partially) 
disassembled. Such process requires the same analog modelling technique as before, plus a set 
of possible/ree places where to put down the single parts, if necessary. 

At this point, we have the complete POL, which is a series / parallel ordering of actions. 

7.5 Concurrency vs. Sequence 

The RV constraints found so far are intrinsic. Therefore, many parts may still be moved 
concurrently. If we have a reduced number of hands, concurrent operations may require a 
serialization, possibly total. In some cases this is not possible: for example, a bolt cannot be 
left inside a vertical hole, with its head down, waiting for the nut. These situations can 
nevertheless be checked using the same analog model. 

As an example, in figure 7 the assembly of cube in the closed box is shown, in the classic 
block world paradigm, typical of A.I. (serial action, one hand). Fig. 7a is the start position, fig. 
7c the final. Phase 1 has found a deadlock in fig. 7b; fig. 7d is the "explosion" of the mental 
model; fig. 7e is the beginning of phase 3, when the cover is pushed away when attempting to 
move the cube. Figs. 7f through 7i show the actual assembly phase: the cover has been 
displaced, then the cube is moved to its final position, then the cover is placed on the box. 

The POL, in out case, is obviously 
((cover FreePointl) (cube CupBottom) (cover CupTop )) 

A comparison with a traditional STRIPS planner solution, for example, is left to the reader. 




337 



The solutions shown in the previous figures were found by a preliminary implementation 
of the Analogic Planner, written in Nem++, a language for spatial and temporal models which 
is an engineered development of Nem (Marino et al. 1985). 



Acknowledgements 

This work is partially supported by EEC FIRST Project and by Special Project on Robotics 
of the Italian Research Council. 



References 

Liebermann, L.I., Wesley, M.A. (1977) AUTOPASS: an automatic programming system for 
computer controlled mechanical assembly, IBM J. of R. & D., July 1977 

Popplestone, R.J., Ambler, A., and Bellos, I. (1978): RAPT: a language for describing 
assemblies, Industrial Robot , vol. 5 no. 3, 131-137 

Hatze, H. (1981) Myocybemetic control models of skeletal muscle, Univ. of South Africa, 
Pretoria 

Hopfield, J.J. (1982) Neural networks and physical systems with emergent collective 
computational abilities, Proceedings National Academy of Sciences USA, 79, 2554-2558 

Funt B.Y. (1983) Analogical Models of Reasoning and Process Modeling, IEEE Computer, 
16:10, 99-104 

Johnson-Laird P.N. (1983) Mental models , Cambridge University Press, Cambridge 

Marino G., Morasso P. and Zaccaria R. (1985) Motor Knowledge Representation, Proc. IJCAI 
(Los Angeles, California) 

Marino G., Morasso P. and Zaccaria R. (1986) NEM: a Language for the Representation of 
Motor Knowledge, in Human Movement Understanding (P. Morasso and V. Tagliasco Eds.), 
North Holland 

Morasso P. and Mussa Ivaldi F.A. (1986) The role of physical constraints in natural and 
artificial manipulation, Proc. IEEE Conf. Robotics and Automation (San Francisco, 
California) 

Morasso P., Mussa Ivaldi F.A. and Zaccaria R. (1987) Redundant robotic manipulators. I: 
regularizing by mechanical impedance, NATO ARW Kinematic and Dynamic Issues in Sensor 
Based Control 




338 



Flash, T. (1987) The control of hand equilibrium trajectories in multi-joint arm movements, 
Biological Cybernetics, 57, 257-274 

Kawato, M., Furukawa, K., and Suzuki, R. (1987) A hierarchical neural-network model for 
control and learning of voluntary movement, Biological Cybernetics, 57, 169-185 

Kuperstein, M. (1987) Adaptive visual-motor coordination in multijoint robots using parallel 
architecture, IEEE Conf. Robotics and Automation, 1595-1602 

Mussa Ivaldi, F.A., Morasso, P., and Zaccaria, R. (1988) Kinematic networks - a distributed 
model for representing and regularizing motor redundancy, Biological Cybernetics, 60, 1-16 

Adomi G., Camurri A., Poggi A., Zaccaria R., (1988) Integrating Spatio-Temporal 
Knowledge: a hybrid approach, 8th European Conf. on Artificial Intelligence, Munich, GFR 

Casalino G., Vercelli G. Zaccaria R., (1988) Trajectory formation and path planning of 
robotized continuous welding tasks, 5th ESPRIT European Conf., Brussels 

Morasso P., Vercelli G., and Zaccaria R. (1988) Rappresentazione del movimento in Robotica: 
modello, ricostruzione, data base cognitivo in un caso antropomorfo, 4-th SIRI National Conf. 
(Milano, Italy) 

Steels L., (1988) Steps Towards Common Sense, proc. of ECCAI88 Conf. (Munich, GFR) 

Bullock, D., and Grossberg, S. (1988) Neural dynamics of planned arm movements: Emergent 
invariants and speed-accuracy properties during trajectory formation, Psychological Review, 
95, 49-90 

Jordan, M. (1988) Supervised learning and systems with excess degrees of freedom, COINS 
Technical Report 88-27 

Morasso, P., Mussa Ivaldi, F.A., Vercelli, G., and Zaccaria, R. (1989) A connectionist 
formulation of motor planning, in Connectionism in perspective , Pfeifer R., Schreter Z., 
Fogelman F., and Steels L. Eds., Elsevier 




Structural Constraints And Computational Problems 
in Motor Control. 



F. A. Mussa-lvaldi and E. Bizzi 
Massachusetts Institute of Technology, 
Cambridge, MA 02139, 

USA. 



1. Introduction 

A central dogma of computational approaches to motor control is that in order to understand the 
biological organization of information processing it is crucial to understand the nature of the 
problems being solved by the motor system and of their possible solutions. Such a statement has 
defined a common ground for investigations in robotics and neuroscience. Among the 
computational problems considered in robotics, a prominent role has been occupied by finding the 
time course of joint configurations given a desired motion expressed in task-coordinates (inverse 
kinematics) and finding the torques that must be applied to achieve a predefined motion of a 
mechanism (inverse dynamics). 

But, if these are fundamental issues in motor control then why have so few insights in robotics 
proven to be of some biological relevance? One answer to this question comes from the 
observation that the structural (i.e. mechanical and geometrical) properties of motor systems 
contribute significantly to the definition of the computational problems and to their solutions. For 
example, in several robotics applications the actuators are force generators with negligible 
impedance properties. Such a "low-level" actuator property is sufficient to motivate extensive 
research in problems of inverse dynamics, whose goal is the specification of force patterns. Another 
feature characterizing many artificial manipulators is their lack of redundancy. Only six kinematic 
parameters are required to completely determine the location of a rigid tool in the workspace. 
Consequently, the most "standard" manipulators have been designed so as to have six 
independently controlled degrees of freedom. In this case, inverse kinematics is a well-posed 
problem which can be solved by inverting the manipulator Jacobian matrix. Therefore, for a non- 
redundant manipulator the problems of dynamics and control can be addressed independently of 
kinematic issues. In fact, the latters have unique solutions for any given task. 

This is clearly not the case with biological systems which are instead characterized by a large 
degree of kinematic redundancy. For example, we can keep our hand in a given location of the 
workspace with different sets of joint angles. Given a desired location of the hand, the choice of a 
body configuration has a direct influence on the hand impedance. The hand impedance 
characterizes the force generated in response to an externally imposed motion. Thus, any particular 
kinematic solution has an immediate effect on the control of the interactions with the environment. 
Another substantial difference between artificial and biological systems regards the property of the 




340 



actuators. As we stated above, robots are typically operated by nearly ideal effort sources: sources 
whose force output does depend upon position. In contrast, the biological muscle makes a "poor" 
actuator when considered as a force generator. A muscle is indeed characterized by non-negligible 
elastic properties which depend upon its level of activation. 

In this paper we will start by reviewing some experimental studies which demonstrated that these 
"tunable" elastic properties of muscles are not a disadvantage for the neural controller. Quite on the 
contrary, they provide the necessary physical substrate for controlling posture, movement and 
contacts with the environment in a unified way and, ultimately for avoiding inverse dynamics 
computations. Then, we will show how the elastic properties of the muscles lead to the 
regularization of the ill-posed problems associated with kinematic redundancy. 

2. Actuator mechanics and computation 

Several investigators of biological systems (Feldman, 1966, Rack and Westbury, 1969, Bizzi et 
al., 1976, Nichols and Houk, 1976) have focused on the role of muscle mechanical properties in 
motor control. The force/length relationship at steady-state in individual muscle fibers was studied 
by Gordon et al. (1966). These authors related the development of tension at different muscle 
lengths to the degree of overlap between actin and myosin filaments. (In the structural organization 
of vertebrate striated muscles sarcomeres are the constituent units which are repeated longitudinally 
along the muscle fibers. These units consist primarilly of comb-like arrays of overlapping actin and 
myosin filaments.) The process of generating force within the muscle is caused by the interaction 
between actin and myosin filaments (Huxley, 1969). Briefly, when the motoneuronal drive to a 
muscle increases, new chemical bonds or "cross-bridges" are formed between the actin and myosin 
filaments. Consequently, an increase of muscle tension is generated concomitantly with an increase 
of muscle stiffness. 

In spite of the complexity of the molecular processes underlying the generation of muscle force, at 
the macroscopic level a muscle can be characterized as a tunable elastic element. This behavior is 
illustrated in Figure 1 where tensions measured in the cat soleus muscle are plotted against muscle 
lengths at various levels of neural activations. The data were obtained by Rack and Westbury (1969) 
who devised a method for applying a set of stimuli to different portions of the same nerve. All the 
stimuli had the same frequency but different phases, in order to avoid synchronous stimulation of all 
muscle fibers. Then, for each tested frequency Rack and Westbury measured the steady-state 
tension corresponding to different muscle lengths. 

Conceptually, muscle spring-like behavior can be summarized as follows: 

1 . For a fixed value of the motoneuronal input, the force generated at steady state is a 
function of muscle length. 

2. This force/iength function can be integrated to derive a potential function equivalent to 
the energy stored (or released) during an externally-imposed displacement. 

3. Changes of neural input to the muscle smoothly affect the force/length relationship 
without drastically changing its shape. 

Indicating by f,l and u the tension, the length and the neural input to a muscle and by E the stored 




341 




150 ° 120 ° 90 ° 60 ° 30 ° 

J 1 1 I I 

Angle of ankle 



Figure 1 : 

Length-tension curves measured in the cat's soleus muscle. Different curves correspond to 
different stimulus rates. The vertical lines show the limits of tension fluctuations during synchronous 
stimulation. (From Rack and westbury, 1969). 



elastic energy, the above statements are summarized by the expressions 

( 1 ) 

f = f(l, u) and 
E = 

We want to stress that in the above definition of spring-like behavior, the force/length relation is not 
required to be linear. As a matter of fact, the force/length curves in Figure 1 display considerable 
non-linearity. However, for any value of length, there is a locally linear relation between 
displacements, dl, and effort changes, df: 
df = k(l) dl. 

In the above expression, k = ^ is the local muscle stiffness which, in general, depends upon muscle 
dl 

length. Whenever the stiffness is not zero, it is also possible to define a local compliance, c = k' 1 , 
which maps an input force change into an output displacement. 



342 



Muscles are arranged about the joints in agonist-antagonist configurations. Then, a limb’s posture 
is maintained when the forces exerted by the agonist and antagonist muscle groups are equal and 
opposite. When an external force is applied, the limb is displaced by an amount depending both 
upon the external force and upon the stiffness of the muscles 1 . When the external force is removed, 
the limb should return to the original position. Experimental studies of motor behavior in humans and 
monkeys have confirmed that arm posture is indeed characterized by spring-like properties (Bizzi et 
al., 1984, Mussa-lvaldi et al., 1985). 

2.1. Virtual trajectories 

This view of posture has been extended to the analysis of movement and trajectory formation 
(Bizzi et al., 1984, Hogan, 1984, McKeon et al., 1984, Flash, 1987.) The observation that posture is 
maintained by the equilibrium between the length-tension properties of opposing muscles led to the 
idea that movements result from a shift of the equilibrium point caused by a change in neural input. 
This hypothesis was first proposed by Feldman (1966). 

Here, we discuss the implications of the moving equilibrium hypothesis in a computational 
perspective. To this end we start with a formal definition of the concept of virtual trajectory (a 
detailed discussion of related control issues can be found in Hogan, 1985 ). Let us first consider the 
case of a simple one-dimensional system characterized by the tunable-elastic behavior of 
Equation (1). Let the stiffness be defined, locally about a given length and at constant input, as 
k = ¥. The equilibrium position of the system, in the absence of external forces, is given by the 

ol 

condition f(l,u) = 0. A stable equilibrium position is characterized by negative stiffness (i.e. the forces 
about the equilibrium are attractive) whereas a positive stifffness implies an unstable equilibrium, 
similar to the equilibrium of a ball on the top of a hill. 

If the input is a continuous function of time, u(t), the static equilibrium condition becomes 
f e 0» u (t)) * 0 

where the subscript, e, indicates that f e is the "elastic" or "steady-state" component of the total 
acting on the system. If the environment does not prevent the system from moving as the 
changes in time, the total force may also contain significant inertial and viscous components. 

The above equation defines implicitly a (unique) map from u(t) to I 

l e (t) = l(u(t)) (3) 

with l e (t) and u(t) satisfying Equation (2), provided that the stiffness, k, at each equilibrium position, 



(2) 

force 

input 



^or the present discussion it is immaterial that part of the muscle "restoring force" may be due to reflexes. In fact the static 
reflex response can be modelled by expressing the input u as the combination of a centrally originated signal, u and a 
feedback signal, u R (l) which is function of I. Then, it is f = f(l, u R (l), u c ) . f (I, u c ) and the reflex behavior is equivalent to a 
stiffness component. 




343 



l e (t), is not zero. This is a direct consequence of a fundamental theorem on implicit functions 2 . 

The function l e (t) is the virtual trajectory of the system corresponding to the input function u(t). A 
few important features of the virtual-trajectory concept are highlighted by this simple unidimensional 
example, in particular: 

• it is not a "hypothesis" but a direct consequence of muscle spring-like properties. 

• it can be regarded as a map from the control input (u) to an entity (l e ) which has the 
same physical dimension as a state component (I, the actual position of the system at 
any time). 

• this map is given by an algebraic, rather than by a differential, equation 3 

It is interesting to observe that the fact that muscles are characterized by significant stiffness 
properties, i.e. that k is generally different from zero, implies that they cannot be considered as 
"good" force generators. An ideal force generator, as an ideal voltage source in an electric circuit, 
should have negligible output impedance. In other words, the output force should not depend upon 
the operating position but only upon the control input. This criterion has been followed in the design 
of many robotic actuators, such as the conventional torque motors. In contrast, biological evolution 
seems to have chosen a different route by developing muscles whose stiffness not only is significant 
but also depends on the level of neural activation. The notion of virtual trajectory transforms this 
feature into a strength rather than a weakness of the biological design. Actually, according to the 
implicit-function theorem, the actuators must have non-zero stiffness for mapping a sequence of 
control inputs into a sequence of well-defined equilibrium lengths. However, we will show that points 
of zero muscle stiffness do not necessarily constitute a problem in a multi-dimensional situation. 

The extension of the virtual-trajectory concept to a multi-dimensional system is straightforward. 
Let us consider, for example, a serial arrangement of n limb segments interconnected by rotational 

joints. The configuration of this system is the array 3 * [q 1 q n ] T of joint angles and the net torque 

is the array T = [T 1 T n ] T of joint torques. Furthermore, the limbs are driven by a set of m muscles 

with m > n, i.e. there are more muscles than joints. Collectively, the muscles are represented by a 

length array, I = l m ] T , by a force array f = [f 1 f m ] T and by an array of input activations, u = 

[u 1 uj. In the hypothesis that each muscle is characterized by tunable spring-like properties, the 

overall muscle behavior is described by the vector equivalent of Equation (1), namely 

I = 1(1. M) (4) 

We obtain the n dimensional joint-torque vector from the m-dimensional muscle-tension vector by 



2 For two variables, x and y, this theorem can be stated as follows. Let F(x,y) be a function continuous with its first partial 
derivatives in a region R of the plane x,y and let the following conditions be satisfied: (a) F(x°, y°) = 0 at a point P° = (x°, y°) in 

R and (b) the partial derivative — is different from zero at the point P°. Then, the expression F(x,y) = 0 can be solved, i.e. a 
dy 

single-valued function can be defined, in the vicinity of P°. 

3 ln general, for a dynamic system, a map of the form x’ = g(x, u) is defined. This map allows us to derive the future time 
course of the system state, x(t), i.e. the actual trajectory, given the input function u(t) and the initial vaue of the state, Xq. Since 
the state equation is a differential equation its solution depends on the arbitrary selection of the initial conditions. In contrast, 
the virtual trajectory, being the result of an algebraic equation, does not contain arbitrary terms. Actually, the virtual trajectory 
defines the initial conditions for the state equations since the system is at static equilibrium before the onset of a movement. 




344 



multiplying the latter by the matrix of moment arms, g. The elements of jj., are the partial 
a/.- 

derivatives _. The relation I = I (g), which uniquely defines the muscle-length array as a function of 

dqj 

the joint configuration is, in general, non linear. Therefore, the matrix of moment arms is also a 
function of joint configuration. Using Equation (4), the net joint torque vector becomes 

I = y, (g) T f = ii (g) T ia (a), u) = I (a. u) (5) 

When the input array is a function of time, u(t), the static equilibrium condition T = 0 defines implicitly 
the joint virtual trajectory as a function of u(t): 

9e(t) * £(u(t)) such that Kg^t), u(t)) = 0 (6) 

which is analogous to Equation (3.) From the fundamental theorem on implicit functions, the 

necessary and sufficient condition for the existence and uniqueness of the virtual trajectory g^t) = 

dr 

g(u(t)) is that the determinant of the joint stiffness matrix k, whose elements are k ■ . = — i (i, j = 1 n), 

J *<ij 

doesn’t vanish 4 at any equilibrium point. 

This condition is equivalent to requiring non-zero stiffness in a one-dimensional systems. Here, a 
negative joint stiffness determinant corresponds to stable equilibrium whereas a positive determinant 
implies unstable equilibrium. Note, however, that asking for non-zerd joint stiffness determinant is 
less stringent than requiring non-zero stiffness for each muscle. The stiffness about a joint is 
obtained from a weighted sum of the stiffness supplied by all the muscles operating about that joint. 
Hence, in order to have non-zero joint stiffness it is not necessary to require non-zero muscle 
stiffness. Actually, there is experimental evidence (Mussa-lvaldi et al., 1985) that the multi-joint 
elastic forces during posture are attractive. In that case, the condition for defining virtual trajectories 
in hand or joint coordinates is satisfied. 

The muscle spring-like properties allow mapping the input (u) into a lower dimensional entity, the 
equilibrium arm configuration, which has the physical dimension of a position variable. Hence, the 
virtual trajectory, together with the joint stiffness, can be regarded as a summary of all the inputs 
involved in the execution of a movement. An important consequence of virtual trajectories is that 
they can be used to avoid the computation of inverse dynamics. The inertial and viscous 
components of the dynamics equations can be treated as "perturbing torques" which make the arm 
deviate from the desired path, specified as a sequence of static equilibrium configurations. At 
moderate speeds and accelerations, the difference between actual and virtual motion is small and 
can be neglected. Clearly, as movement speed and acceleration increase, limb inertia and viscosity 
are expected to cause larger deviations from the virtual trajectory. However, these deviations can 
be corrected by increasing the stiffness and/or by modifying the virtual trajectory itself. 

The elastic properties of the actuators also provide stability to the planned movements (Hogan, 



4 The theorem of implicit functions is readily generalized to a system of n equations: 

F < x r-yi--yn)-° : l = 1 - n 

by requiring that at a solution point P° = (x 1 0 ) ...,x fn °,y 1 0 y °) the functional (or Jacobian) determinant, — is different from 

dy 

zero. Then, the above n equations can be solved for y jp i.e. n functions ^ = f j (x 1 x ) can be defined, in the vicinity of P° 




345 



1985). In a computational approach based on the explicit solution of inverse dynamics, problems 
arise when external perturbations (e.g. hitting an obstacle) cause a deviation from the 
preprogrammed path since the dynamics must be recomputed or, alternatively, some special servo 
control mechanisms must be provided. In contrast, if movement is obtained by shifting the 
equilibrium position defined by elastic actuators, a deviation from the intended path simply results in 
larger restoring forces without need of computation or of particular control schemes (McKeon et al, 
1984). 

3. Passive motion 

So far, we have how the actuator’s elasticity can simplify the control tasks by providing an implicit 
map between a set of control input and a corresponding set of equilibrium configurations. These 
equilibrium configurations correspond to actual joint configurations when posture is maintained in the 
absence of external loads. During the execution of a movement, the static equilibrium configuration 
associated, at each time, with the control inputs is a moving "center of attraction" which generates a 
driving torque without any explicit dynamic computation. In the remainder of this paper we will show 
that the elastic properties of the actuators also lead to the solution of inverse kinematics problems in 
redundant motor systems. 

The task of generating coordinated motion in a redundant mechanism has been conventionally 
described as an ill-posed inverse kinematics problem. Let us consider, for example, an extremely 
simple system composed of two linear elements connected in series, as shown in Figure 2A. The 
kinematic variables describing these elements are the lengths ^ and l 2 . Then, the net system length, 
L, is derived from the element lengths through the "direct kinematics" equation: 

L = l 1 + I 2 (7) 

Given the above expression, the inverse kinematics problem consists in finding the values of ^ 
and l 2 corresponding to a desired total length, L. A geometrical interpretation of this problem is 
shown in Figure 2B, where the kinematic space of the system is represented as a two-dimensional 
coordinate frame. Each axis of this frame represents the length of one element. The oblique line is 
at a distance L from the origin and corresponds to the constraint equation (7). Then, all (and only) 
the pairs of values (l 1s l 2 ) corresponding to the points on this line are solutions of the inverse- 
kinematic problems for a desired displacement L. Optimization methods result in the selection of a 
specific point by adding some further constraint. For example, the point D= (t , ^), at the 

intersection of the solution line with the perpendicular drawn from the origin, corresponds to a 
minimum-norm criterion. 

The most common engineering approach to an underdetermined problem involves the definition 
of cost functions according to specific goals such as avoiding singular configurations (Bailleul et al., 
1984), minimizing joint torques (Hollerbach and Suh, 1985), joint motion (Brooks, 1982) or kinetic 
energy (Kathib, 1983). Here, we present a different point of view, based on simulating passive 
motions within a field of elastic forces. This approach was introduced by Mussa-lvaldi et al. (1988) 
and has been named the passive-motion paradigm. Essentially, the passive-motion paradigm 




346 





•p* 

dl2 = c2 df2 




Figure 2: Passive motion in a simple system (see text.) 



exploits the fact that a passive displacement of a physical system characterized by impedance 
properties always leads to a well-posed problem. This is the case regardless of the unbalance 
between the degrees of freedom of the system and the number of dimensions necessary to fully 
describe the imposed displacement The regularization principle implicitly implemented by this 
physical solution is the minimization of potential energy. This fact is dramatically demonstrated by 
any elastic sheet, such as the surface of a drum which is characterized by an infinity of degrees of 
freedom. When a local displacement is applied at a point (e.g. by pushing with a pencil at the center 
of the drum,) the whole surface changes its configuration so that the strain energy is minimized. 




347 



In the example of Figure 2, the passive-motion paradigm requires the definition of the two 
components as elastic elements. Let's assume that these elements are tunable compliances 
{Figure 2C) with the following feng th/tension relations: 

= C|(U,j) fj + I0 1 (u 1 ) (8) 

I2 ” ^2(1^2) fg 4 lO^fiig). 

The parameters c 1 and ^ are the element compliances* The parameters 10 1 and I0 2 are rest- 
lengths* that is the lengths assumed by each component when the tension is zero* Both compliance 
and rest-length depend upon each element's input in a known way. Let us also assume that the 
Inputs are set to some Initial value, u t , NIT * u 2 ]NjT . Accordingly* the rest-length will assume the 
corresponding values I0 t INrr , I0 2 INJT . If no external force is acting upon the system* it will settle to 
a "global” equilibrium position L0 jN | T ^ I0 1 , NST 4 I0 2 , NlT . The net compliance at this location is 
c = C-| 4 c 2 (C ^ 0.) 



if at this point a total displacement dL is imposed on the system endpoint* a net elastic force is 
generated, dF * C 1 dL, opposite to the displacement. This is also the force "seen" at steady state by 
each of the elements* that is = df 2 = dF. Then, the two elements will be displaced by the 
amounts 

dl i' sC i dF = ^ dL w 

dU-c 2 dF = _f*_dL. 

C \* G 2 

Thus, we have effectively an inverse -kinematic map from dL to the pair d! 1t dl 2 . 



It is readily seen that this solution corresponds to the point of minimum potential energy 
compatibly with the kinematic constraint: 

d!j 4 dl 2 = dL, (10) 

In fact* the potential energy* E* in the vicinity of the equilibrium configuration, I0 1 | NJT * I0 2 mT , is 






01 ) 



Then, the first (second) Equation (9) is directly obtained by substituting (10) for dl 2 (d^) in (11) and 



finding the solution for M = 0 (M = 0.) 

*1 



From this simple example it is possible to draw some important observations. First, the minimum 
norm solution, dl t = dl 2 = y, is the “natural" response to a passive displacement when =* c 2 . By 

choosing different patterns of compliance one obtains different points on the solution set (Figure 2D) 
in a way which is strictly consistent with the notion of impedance: the more one element is compliant 
(relatively to the others)* the more it will tend to contribute to the globaf motion. In general, different 
settings of the elastic properties associated with the degrees of freedom will yield different 
configurations of minimum potential energy, given the same externally imposed displacements. 
Since we are assuming that the control signals can tune the local elastic properties of the actuators, 
we have at our disposal a mechanism to generate different patterns of movements, given a single 
kinematic task. 




348 



The second observance concerns information processing. The computational scheme used for 
deriving the expressions (9) is inherently modular and distributed. In Figure 2E, each physical 
element is represented by a processing unit which computes its position and effort according to the 
environmental constraints and to its own elastic behavior. Furthermore, a computational unit is 
introduced representing the elastic behavior of the whoie system: its effort and position variables are 
F and L respectivety and its compliance is C {- c t + c 2 .) Each element derives its own effort and 
position changes on the basis of locally available informations. When a displacement dL is imposed 
by the environment on the element representing the net system, the latter derives the effort change, 
dF, multiplying dL by its own stiffness. Then, each processing element sends this information to the 
elements representing smaller components. These compute their displacements independently of 
each other on the basis of their compliance. All the coupling information, in this case the net 
compliance, C * c T + c 2 , is concentrated in the elastic element which represent the whoie system. 

It is possible to go one step further in the passive motion paradigm by letting each element 
change its input so that the new position is at equilibrium. Denoting by the sensitivity of element 

i's length to a change in input, ^ and assuming that exixts, an input update du ; = dl s 

applied independently to each controlled element is sufficient to bring the whole system at 
equilibrium in the new position L* * L0| N!T + dL 

As it happens with other types of network (Hopfield, 1982} the overall system behavior is 
characterized by a global potential function. In our case, this function is the elastic energy 
associated with the system's spring-like properties. Following the local input-update rule outlined 
above is equivalent to the global requirement that the transition between one set of inputs to the next 
is chosen so as to minimize the net change of potential energy, compatibly with the task that is, in 
this case, compatibly with the necessity of displacing the system equilibrium by an amount dL 

tn the next section we will show that passive motion leads to well-defined Inverse kinematic 
solution also in redundant kinematic systems characterized by non-linear geometries. Specifically, 
we wiii consider the problem of transforming a displacement of the hand dx, into the corresponding 
joint displacement d^. 

4, Inverse kinematics 

The kinematic transformations for a redundant serial chain, such as the human arm, are a set of 
non-linear equations that map the joint configuration g = [q v q 2s+T ., q w | T to a lower-dimensional set of 
task variables x = [x v x M ] T {M < N.) We represent these equations as a single vector map 

x = x(g) (12) 

In the following discussion we will assume that this map Is continuous and differentiable up to the 
second order within the entire workspace. 

A typical choice for the task variables is the hand-position vector with respect to a base-frame. 
However, It should be stressed that the definition of x may change from one task to another. For 
example, we may carry an object without being concerned about its orientation. Then, x has only 




349 



three translational components. In contrast, we may be also specifying an orientation. In that case, 
x has three translational and three rotational components. Thus, the degree of redundancy , N-M, 
depends upon the nature of the task as well as upon the stucture of the manipulator. An immediate 
consequence is that any general-purpose robot is bound to be used with some degree of 
redundancy whenever the task requirements become less stringent than those which guided the 
design process. 

Given the kinematic transformation (12), the M-dimensional hand force F maps into the N- 
dimensional joint torque T as 

1 = J t F (13) 

where J is the Jacobian of the kinematics equations (12). The force transformation (13) is a 
“ a? 

direct consequence of the principle of virtual works. This principle states that the mechanical work 
(the inner product of force and displacement) is invariant by coordinate transformations. That is: 
F T dx = F T Jda = J T dg. 



Note that, while a displacement joint coordinates maps uniquely into a displacement in hand 
coordinates (dx = J dg,) the reverse is true for the forces: a tip force £ maps uniquely into a torque 
vector T. Thus, for a redundant serial mechanism we have two complementary ill-posed problems: 

1. Finding the hand-force corresponding to an arbitrary set of joint torques. Since there 
are more constraint equations than unknowns, this problem may have no solutions. 

Joint torques cannot be selected arbitrarily. 

2. Finding the joint configuration corresponding to a given hand position. There are more 
unknowns than equations and the problem has infinite solutions. 



Here, we focus on the solution to the second problem which involves a transformation from task to 
joint coordinates. As is the case for the simpler spring system described in the previous section, a 
passive motion, dx, imposed by the environment on the tip generates a unique configurational 
displacement dg. This passive solution can be derived in two equivalent ways. In one case, one 
may search for the configuration that corresponds to a minimum of potential energy compatible with 
dx. Alternatively, dg is obtained from the combination of three algebraic maps. 

1 . The first map is provided by the hand stiffness that transforms dx into a force 
change dF as: 

d£=Khanddx (14) 



2. The second map is the transformation from dF to the corresponding torque-change, 
dT. This map is obtained by differentiating the force/torque transformation (13): 

dj = J T dF + dJ T F. (15) 

The second term on the left-hand side is required to account for the fact that at steady- 
state, the configuration changes by an amount dg. Since the Jacobian is configuration 
dependent, J changes by an amount dJ. With simple calculations we obtain: 

dJ T F = £dg 

where £ is a NxN matrix which has the physical dimension of a joint stiffness and 
whose components are: 



r • - 

1 I j “ ZjJfcz 1 






( 16 ) 




350 



Using this expression, the transformation (15) becomes 

T (17) 

dI-J T dF + rdg. 

3. Finally, the third map is the transformation from dT to dq which is provided by the joint 
stiffness, c: 

dq = cdT. (18) 

Here, we assume that the compliance, c, is the Jacobian of some arbitrarily defined 
control law (such as q = ^ + cT.) This control law can either be explicitly implemented 
by some feedback mechanism or it can be a way for representing the steady-state 
behavior of some elastic actuator. In both cases we make the further assumption that 
det(c) * 0. Therefore, the causality of Equation (18) for deriving an output torque in 
response to an input displacement as: 

dj = kdq (19) 

where k = c* 1 is the joint stifffness. 

We can combine the three differential maps (14), (17) and (18)as shown in Figure 3 to obtain a 
single transformation from dx to dq: 

da=(k-D' 1 J T KhanddX' (20) 

By imposing that this map is actually an inverse of the direct kinematics, dx = J dq, we obtain the 
following expression for the hand stiffness: 

l£hand = (i^LiN J 7 )’ 1 (21) 

where we have set c UN = (k - T)" 1 . We call c^ the linearized joint compliance. Its inverse, the 
linearized joint stiffness, k UN , differs from the actual joint stiffness by the correction matrix r, which 
accounts for the position-dependency of the Jacobian matrix. Using this correction, the end-effector 
stiffness is obtained as a linear transformation of k^. 

Joint 

coo rdinate 



d-itrr 

Figure 3: 

Differential transformation graph for a redundant arm. The graph is fully connected: each node 
can be reached from any other node through a single path. The thicker arrows define a path from dx 
to dq. This path is equivalent to a weighted generalized-inverse of the Jacobian, J. 



In summary, the elastic properies of the joint actuators expressed by Equation (18), provide a 
natural solution to the inverse kinematic problem for a redundant mechanism. Making use of c,j N 
we can write this solution as the differential transformation 




End point 





351 



dCf = S_IN ^ (si S_IN 1 ^ (22) 

which is a weighted generalized-inverse of J. This expression characterizes the change of 
configuration corresponding to a quasi-static displacement of the endpoint. Therefore it is equivalent 
to minimizing potential energy while satisfying the constraint imposed by the motion of the end-point. 

5. Passive motion is integrable 

We have shown that passive motion provides a local solution to the inverse kinematic problem 
(Equation (22) as a weighted generalized-inverse of the Jacobian matrix. The general expression 
for the weighted generalized-inverse of J is 

P w = w J T (J w J 1 )- 1 (23) 

where w is some NxN weight matrix. With this generalized inverse, the configuration change 
dg = P w dx is guaranteed to satisfy the kinematic constraint 
dx = J dg 

while, at the same time, minimizing the quadratic form 
= dg T w' 1 dg. 

In particular, when w is the unit matrix, \, the quadratic form E, is the norm of the configuration 
change, dg T dg. Thus, the generalized inverse P, = J T (J J T ) _1 which is also known as the Moore- 
Penrose pseudoinverse of J, corresponds to a minimum-norm criterion (Ben Israel and Greville, 
1980). The map corresponding to an externally imposed displacement is equivalent to the 
application of a generalized pseudoinverse P^. This is consistent with the physical requirement 
that the passive configuration change is constrained by a minimum potential energy criterion. 

There is no a priori gurantee that a weighted generalized inverse of J is integrable , i.e. that within 
some domain of the workspace, P^, is the Jacobian of some global inverse-kinematic function 
mapping any given position x into a single configuration g. Whenever such a map exists, then 
iterating its Jacobian from a starting position Xq to a final position ^ results into a final configuration 
g v The latter depends upon the initial configuration g 0 but not upon the particular workspace path 
chosen to join Xq to x v Equivalently, a necessary and sufficient condition for a local generalized- 
inverse map to be integrable is that the iteration of this map along any closed path brings the system 
back to the same starting configuration. A number of investigations (Liegeois, 1977, Klein and 
Huang, 1983, Hollerbach and Suh, 1985, Wampler, 1987) have demonstrated that the most 
common generalized-inverse operators do not provide integrable maps. Most notably, Klein and 
Huang (1983) showed that iterating the Moore-Penrose pseudoinverse along closed paths can 
produce a variety of non-cyclic behaviors. In some cases, as the hand passes repeatedly through 
the same point, the joint configuration changes until a limit configuration is reached. However, this 
limit configuration may depend upon the direction of travel around the path. In other cases, the 
configuration keeps changing without reaching any limit. 

Mussa-lvaldi and Hogan (1989) have recently demonstrated that the weighted generalized 
inverse, , corresponding to an externally imposed displacement is integrable within any simply 
connected domain in which the manipulator end-point doesn’t loose mobility (i.e. in which 




352 



det( J Cli N J t ) * 0.) This result is consistent with the physical notion that the minimum potential 
energy constraint is sufficient to uniquely define a configuration corresponding to a given location of 
the end-point. Consider, for example a redundant serial kinematic chain. Let each joint be 
characterized by a fixed torque/angle relation so that the linkage is at equilibrium at a configuration 
corresponding to a tip-location Xq. This relation can be simply obtained by placing a "winding 
spring" on each joint. It is intuitively evident that if we force the tip along a closed path terminating 
back to Xq, the whole mechanism will return to the starting configuration. It is also intuitive that the 
only way to achieve a different configuration is by "winding" the joint springs. This can be achieved 
by moving the tip along a path which encloses some singular points of the workspace, that is 
carrying out the iterartion in a non -simply-connected domain. 

From an algebraic point of view, the key to the integrality of passive motion is provided by the 
matrix r whose elements are given by Equation (16). This matrix is an impedance component which 
contributes to the expression of Cy,^. Let us consider a situation in which the following control law is 
applied to each joint: 

T s - k (qj - qOj) 

(k is the same scalar constant for all the joints.) Then, the net joint stiffness is proportional to the 
unit matrix. Let us assume that the mechanism is at the initial equilibrium configuration gO, 
corresponding to the initial end-point location xO. As the end point is forced by the environment to 
follow a closed path, the configuration undergoes a sequence of displacements 

da = Q - ECg)) J T (J Q - rcg))- 1 j 1- )- 1 dx. (24) 

Initially, r is equal to zero since the end point is at equilibrium (F=0). Thus, the initial displacement is 
simply given by the Moore-Penrose pseudoinverse, P|. As the tip is moved away from equilibrium, 
the elastic force increases and, consequently, the r term also increases. On the way back, r 
decreases. When the end-point has returned to the initial position xO, the whole mechanism is 
guaranteed to be at the starting configuration gO (provided that the path did not enclose any 
singularity). In contrast, if the displacements around the whole path were calculated using the 
Moore-Penrose pseudoinverse (i.e. with r = 0 everywhere) then, returning to xO, the configuration 
would have been different from the starting value gO. Thus, the matrix T(g) is a configuration- 
dependent correction that is equivalent to a "memory" of the initial conditions. 

6. Joint-limits and singularities 

In order to illustrate the computational value of integrability, let us consider the problem of joint- 
limit avoidance. All the biological joints are characterized by a limted range of motion. One possible 
role of joint limits is to help preventing the arm from falling into kinematic singularities located in the 
interior of the workspace. A kinematic singularity is a configuration in which rank(J) is less than the 
task-space dimension. In this case the manipulator looses mobility by becoming "infinitely stiff" in 
one or more directions. For example with a three-link planar arm, singular configurations are 
reached when the elbow angle is 0 or 180 degrees (forearm parallel to the upper arm) and the wrist 
angle is 180 degrees (hand folded back onto the forearm). Since the physiological elbow and wrist 
cannot reach these angles, the corresponding singularities are avoided. However, joint limits 




353 



introduce another type of singularity: the end-point stiffness becomes singular whenever the joint 
compliace matrix loses rank (more precisely, whenever rank(c UN ) < M.) Since the joint limits are 
angles at which the joints reach zero compliance, their presence introduces new singular points. 

In the context of passive motion, one way to avoid joint limits is to design the stiffness of each 
joint so that it increases as the joint approaches a limit angle. Thus in a passive displacement of the 
end point, when a joint gets near to a limit a larger portion of the motion is taken over by those other 
joints that are closer to their midrange. Following this rationale, one would expect that singular 
points are reached only when several joints are simultaneously close to their limits. This approach is 
illustrated in Figure 4 for a three-link planar manipulator. The shaded area near each joint indicates 
the range of motion (Figure 4A). Within this range the stiffness is set as (Figure 4B): 

k(q) = oos 2 ( A (q - q M!D )) (25) 

A = ^^MAX ' ^MIn) > ^MID = ^MIN + QmAxV 2 - 



A 



max 




m±n 



STIFFNESS 

(N*m/rad) 




Figure 4: 

Three-joint planar manipulator with limited joint ranges. A) The shaded areas indicate the range 
of motion of each joint. B) Joint limits are implemented by the joint-stiffness increasing at the 
extremes of the ranges of motion. 



Figure 5 shows a comparison between two passive motions from the same starting configuration 
to the same end-point locations. In Figure 5A, the manipulator joints have equal and constant 
stiffness through the workspace. At the end of the movement the joint configuration (q 1 = 39°, q 2 = 
168°, q 3 = 27°) is close to a kinematic singularity (q 2 = 180°, q 3 = 0°). In Figure 5B, the same 
manipulator has variable joint stiffness, according to Equation (25). Thus, the motion of the elbow 
has been significantly reduced at the expenses of larger displacements of the wrist and of the 
shoulder. Consequently, the final configuration (q 1 = 66°, q 2 = 150°, q 3 = 36°) is at a larger distance 
from the singularity. 




354 





Figure 5: 

Dangerous locations. A)The arm moves from the start to the end locations with uniform joint 
stiffness. B) The same endpoint movement (from the same starting configuration) is performed with 
the joint stiffness increasing towards the joint limits. (Joint ranges. Shoulder and wrist : +/- 140°. 
Elbow: 0° - 1 70°.) 



We simulated the passive movements by iterating an integrable generalized-inverse of the 
Jacobian matrix (Equation (22)). For joint-limit avoidance, the expression of this generalized inverse 
is: 

Ec lin = m -D' 1 J T y m - D’ 1 j 1 )- 1 (26) 

where k(g) is a diagonal joint-stiffness matrix whose elements are given by Equation (25). As it has 
been shown by Mussa-lvaldi and Hogan (1989) the integrability of is entirely a consequence of 
the r matrix. This point is illustrated by the following example. Let us consider the generalized 
inverse, F^, that is obtained from by removing the V matrix : 

P c = k(g)- 1 jTfJkfg)’ 1 J 1 )- 1 . (27) 

Pc is a weighted generalized-inverse, whose weight matrix depends upon the joint configuration 
according to Equation (25). Then, consider the task of moving the end point of the three-link 
manipulator on a closed target pattern, as it is shown in Figure 6. The starting target, 1, 
corresponds to the initial configuration (q 1 = 0°, q 2 = 85°, q 3 = 0°). From this configuration we 
iterated ^ untill the hand reached target 2. Then we continued the iteration of on a straigth line 
to target 3, and finally, the path terminated at target 1. Note that the final and the initial 
configurations are different, in spite of the absence of singularities within the triangle 1-2-3. Thus, P^ 
is not integrable. 

This result is explained as follows. As the tip goes to target 2, the elbow joint gets close to its 
lower limit (0°). Target 2 is indeed near the boundary of the workspace, where the inverse 




355 




Figure 6: 

Joint-limit avoidance. Iteration of a non-integrable generalized inverse around a closed path. The 
arm does not return to the starting configuration (see text.) 



kinematics is unique and singular. At this location the elbow has a large stiffness, compared to the 
other joints. Thus, when the hand moves to target 3 and then back to target 1 , the elbow angle tends 
to remain constant. Consequently the final position is reached with an elbow angle that is smaller 
than the initial value at the same tip location. Using P^, once a joint gets near to its limits it tends to 
remain "stuck", since has no memory. 

In contrast, when we used to generate the same motion of the tip, the joint configuration 
returned to the initial value as the tip moved back to target 3 (Figure 7). Note that me configuration 
obtained at target 2 by iterating is nearly identical to that obtained by iterating P c . This result is 
merely a consequence of the fact that target 2 is close to the workspace boundary (where the 
manipulator "loses" its redundancy.) Therefore, also with P^ the elbow stiffness at target 2 is 
larger than the other joint stiffness components. However, this differential operator now contains a 
memory of the initial conditions, the matrix r, which is sufficient to obtain a larger elbow motion 
when the tip returns to more internal regions of the workspace. 

7. Passive behavior and active control 

So far we have represented the kinematic tasks (that is the desired motions of the endpoint) as 
simulated passive-motions that are imposed by the environment upon the end-effector. The 
manipulator was assumed to be characterized by fixed elastic properties (Equation (18) or (25)). 



356 






Figure 7: 

Joint-limit avoidance, iteration of an integrable generalized inverse around the same path of 
Figure 6. The arm returns to its starting configuration (see text.) 



Then, the degrees of freedom move in such a way as to minimize the potential energy stored in the 
compliance. We have shown that the matrix T is essential to simulate this process correctly by 
taking into account the effects of the non-linear geometries which become significant as the endpoint 
moves away from the equilibrium position. Thus, we have shown that T is equivalent to a memory 
term. 

An alternative point of view is to consider r as an impedance instead of as a geometrical term (T 
has indeed the physical dimension of a joint stiffness.) Let us assume that the desired motion is 
expressed as a sequence of equilibrium positions. To achieve this the N configuration variables, q js 
must depend not only upon the generalized forces, but also upon N control inputs, u s . The required 
behavior of the actuators is then characterized by a controllable compliance function of the form: 

g = MI,u). (28) 

The equilibrium configuration associated, at steady state, with an input vector u is 

9o(u) -♦(&□). (29) 

A change, du, of the input causes the equilibrium configuration to be updated by an amount: 

dq 0 = a du with a = (30) 

du 

Here, a is a local sensitivity matrix which we will assume to be non-singular (det(o) * 0). Then, as 
the input changes smoothly in time, the above equation defines a sequence of static equilibria - a 
virtual trajectory. 



357 



The inverse kinematic problem becomes that of finding an appropriate sequence of inputs u(t), 
given a desired trajectory of the endpoint, x^t). One way to do this is to simulate externally-imposed 
displacements which will drive the joints away from equilibrium; then, at the end of each 
displacement, the input is modified to set equilibrium at the new manipulator configuration. This 
method which has been called the passive-motion paradigm leads to a distributed representation of 
motor redundancy (Mussa-lvaldi et al, 1988). Starting from an equilibrium position, Xq 
( corresponding to g^(u)), using the generalized-inverse (c = |£) corresponds to iterating a 

change of equilibrium, (kfo = dx^ by updating the input with du = cr 1 dc^. However, since this 
process occurs about equilibrium, the matrix T is zeroed at each step, and « F^. Therefore 
there is no guarantee that the input update is integrable, unless c = c’ UN for some compliance matrix 
c\ This condition corresponds to requiring that at each equilibrium configuration, the joint 
compliance, c, is: 

c - (c’ _1 - T)" 1 (31) 

where c’ is the compliance associated to some known differentiable function $(J). 

8. Summary and conclusions 

In this paper we have considered the computational value of the mechanical properties which 
characterize the biological actuators. In particular, we have focussed on the force/length relation 
which characterizes the behavior at steady state of human muscles. The fact that muscle force has 
a significant dependence upon muscle length constitutes a major departure of muscle’s behavior 
from the canonical idea of a "good" robotic actuator: a force generator with negligible impedance. 
However, here we have shown that instead of implying a computational burden, the muscle 
properties offer simple solutions to some major problems such as (a) the problem of mapping a large 
set of control inputs into the equilibrium posture of a limb, (b) the problem of generating movements 
without computing the inverse dynamics and (c) the inverse kinematic problem for redundant 
systems. 

Muscle elasticity provides a unified approach to posture and movement. Postures are equilibrium 
configurations resulting from the balance of agonist and antagonist muscles. These configurations 
act as points of attraction whenever an external perturbation generates some displacement. At the 
same time, the equilibrium posture is associated to a pattern of neural commands directed to each 
muscle. Thus, as the neural inputs to the muscles change in time, the static equilibrium shifts along 
a "virtual trajectory" and acts as a center of attraction interacting with limb inertia and viscosity. 
Here, we have shown that the virtual trajectory provides a unique map from a high-dimensional set 
of neural-activation variables to lower-dimensional kinematic variables: the joint equilibrium 
configuration and the equilibrium position of the endpoint. The condition for this mapping to be 
defined is simply that the net joint and hand stiffnesses have non-zero determinants. So far, this 
condition has been found to be satisfied in multi-joint posture (Mussa-lvaldi et al., 1985). 




358 



Elastic properties also provide a natural solution to ill-posed coordination problems arising in 
redundant systems - systems with an excess of degrees of freedom with respect to the number of 
kinematic variables which fully describe a task. It is widely acknowledged that adding degrees of 
freedom to a manipulator can significantly improve its dexterity, that is its ability to perform different 
tasks in a wide range of environmental conditions. However, redundancy also introduces new 
challenging problems. One is that of computing the different joint-coordination patterns 
corresponding to a single end-effector task. Another is to ensure that local inverse-kinematic 
solutions correspond to well-defined global maps. Both problems are solved by the physics of 
elastic actuators when a passive motion is imposed by the environment to the endpoint. 

Passive motion offers a natural solution to ill-posed kinematic problems. This solution rests upon 
the principle that the potential energy is minimized compatibly with the kinematic constraints. 
Furthermore, the representation of passive motion as an incremental process can be integrated to 
provide global inverse-kinematic functions (Mussa-lvaldi and Hogan, 1989). Therefore these 
solutions do not depend upon the path that is followed to reach an endpoint location from a given 
starting point. A significant aspect of passive motion is that its integrability can be entirely attributed 
to a specific operator, the gamma matrix. This operator is equivalent to a joint-stiffness correction 
that can be applied to a virtual-trajectory controller. Thus it is possible to generate active motion 
which mimics the integrate passive behavior. 



ACKNOWLEDGMENTS: This work was supported by the National Institute of Neurological Disease 
and Stroke Research, Grant NS09343 and by the Office of Naval Research, Grant 
N0001 4/88/k/0372. 



References 

Baillieul, J, Hollerbach, J, M and Brockett, R, W (1984) Programming and Control of Kinematically 
Redundant Manipulators. Proc. of the 23rd IEEE Conf on Decision and Control, Las Vegas, 
Nevada, Dec 12-14: 768-774 

Ben-lsrael, A e Greville, T, N (1980) Generalized Inverses: Theory and Applications. R. E. Krieger 
Publishing Co., New York 

Bizzi, E, Polit, A and Morasso, P (1976) Mechanisms Underlying Achievement of Final Head 
Position. J. Neurophysiol. 39:435-444 

Bizzi, E, Accornero, N, Chappie W and Hogan, N (1984) Posture Control and Trajectory Formation 
During Arm Movement. J. of Neurosci. 4:2738-2744 

Brooks, T, L (1982) Optimal Path Generation for Cooperating or Redundant Manipulators. Proc 2nd 
Int. Computing and Engineering Conf., San Diego, CA: 119-122 

Feldman, A, G (1966) Functional Tuning of Nervous System with Control of Movement or 
Maintenance of a Steady Posture. II. Controllable Parameters of the Muscles. III. 
Mechanographic Analysis of the Execution by Man of the Simplest Motor Task. Biophysics 
11:565-578:766-775 

Flash, T (1987) The Control of Hand Equilibrium Trajectories in Multi-joint Arm Movements. Biol. 
Cybern. 57:257-274 




359 



Gordon, A, M, Huxley, A e Julian, F, J (1966) The Variation in Isometric Tension wit Sarcomere 
Length in Vertebrate Muscle Fibers. J. Physiol. 184:170-192 

Hogan, N (1984) An Organising Principle for a Class of Voluntary Movements. J. Neurosci. 
4:2745-2754 

Hogan, N (1985) The Mechanics of Multi-joint Posture and Movement Control. Biological 
Cybernetics 53:1-17 

Hollerbach, J, M and Suh, K, C (1985) Redundancy Resolution of Manipulators Through Torque 
Optimization. Proc. IEEE Int. Conf. on Robotics and Automation, St. Louis, MO, 

pp.1016-1021 

Hopfield, J, J (1982) Neural Networks and Physical Systems With Emergent Collective 
Computational Abilities. Proc. Natl. Acad, of Sciences, USA, 79:2554-2558 

Huxley, H, E (1969) The Mechanism of Muscular Contraction. Science 164:1356-1366 

Kathib, O, Dynamic Control of Manipulators in Operational Space. 6th IFToMM Congress on Theory 
of Machines and Mechanisms, New Dehli, December 15-20 

Klein, C, A and Huang, C, H (1983) Review of Pseudoinverse Control for Use with Kinematically 
Redundant Manipulators. IEEE, Trans. System Man and Cybern., SMC-13, pp. 245-250. 

Liegeois, A (1977) Automatic Supervisory Control of the Configuration and Behavior of Multibody 
Mechanisms. IEEE Trans. System Man and Cybernetcics, SMC-7: 868-871 

McKeon, B, Hogan, N and Bizzi E (1984) Effect of Temporary Path Constraint During Planar Arm 
Movements. Abstr. 14th Ann. Conf. Soc. for Neurosci. (Anaheim CA Oct. 10-15): 337 

Mussa-lvaldi, F, A, Hogan, N and Bizzi, E (1985) Neural, Mechanical and Geometric Factors 
Subserving Arm Posture in Humans. J. of Neurosci. 1 0:2732-2743 

Mussa-lvaldi, F,A, Morasso, P and Zaccaria, R (1988) Kinematic Networks. A Distributed Model for 
Representing and Regularizing Motor Redundancy. Biol. Cybern. 60:1 -1 6 

Mussa-lvaldi, F, A and Hogan, N (1989) Solving Kinematic Redundancy with Impedance Control: a 
Class of Integrable Pseudoinverses. Proc. IEEE Inti. Conf. on Robotics and Automation, 
Scotsdale, AR: 283-288 

Nichols, T, R and Houk, J, C (1976) Improvement in Linearity and Regulation of Stiffness That 
Results from Actions of Stretch Reflex. J. Neurophysiol. 39:1 1 9-1 42 

Rack, P, M, H and Westbury, D, R (1969) The Effects of Length and Stimulus Rate on Tension in 
the Isometric Cat Soleus Muscle. J. Physiol. 204:443-460 

Wampler, C, W (1987) Inverse Kinematic Functions for Redundant Manipulators. Proc. IEEE Inti. 
Conf. on Robotics and Automation, Raleigh, NC:610-617 




Motion Control in Intelligent Machines 

A. Meystel 



Department of Electrical and Computer Engineering 
Drexel University, Philadelphia, PA 19104 

Abstract 

Motion control in Intelligent Machines is a part of their system of Intelligence. The system of Intelligence is 
viewed as a nested heterarchical structure with control loops associated with its nested mechanism of decision 
making. NICS philosophy is outlined, and an attempt is made to consider an operator of information representation 
and refinement serving the decision making processes. Different task-decomposition routines are considered 
supporting decision making within the level of resolution and decision-making processes which determine the 

s|e 

interaction of adjacent resolution levels. NICS - operator is introduced applicable to systems of Intelligent Control. 



1. Introduction 

1.1 Motivation 

Motion control in intelligent machines is a part of their overall functioning. Until now we 
do not have a uniform well established approach to dealing with functioning of intelligent 
machines. Nested Intelligent Control Structure (NICS) proposed in this paper is inspired by and 
is a further development of the nested approach to decision making in NASREM. The latter has 
been introduced first in [1] and then its developments were presented in [2-4]. It turned out that 
NASREM is applicable for a wide variety of systems in Robotics and Computer Integrated 
Manufacturing. 

The core of the NASREM concept can be formulated as follows: any system can be 
modeled as a sequence of three subsystems 1 : Sensory Processing (Perception), World Model 
(Knowledge Organization), and Task Decomposition (Planning/Control) 2 . Each of these 
subsystems is a hierarchy, and all three hierarchies communicate horizontally at different levels of 
resolution.The Goal of functioning is coming at the top and is subjected to subsequent 
decomposition into a hierarchy of tasks, and Action is presumably emerging at the bottom of the 
subsystem of Task Decomposition. External information is presumably arriving at the bottom of 

1 Structurally, these subsystems are built as heterarchies, i.e. hierarchies with links 
among the nodes at each level of the structure. We will never talk about hierarchies, only 
about heterarchies in order to underline this property of a heterarchy: having an interrelated 
structure horizontally at each particular level. 

2 Planning/control system is supposed to produce the heterarchy of task decomposition. 

A string of tasks at each level of the hierarchy is what we call “the control law” for this level. 

So, in an intelligent machine the task decomposition is never given to the controller: it 
should be generated by the controller. 




362 



the pyramid of sensory processing. All subsystems are somewhat connected to the Global Memory. 

The information processing hierarchies are built according to general views presented in [5], 
Interaction among the three subsystems (Perception, Knowledge Representation, and 
Planning/Control) is similar to processes characteristic for structures described in [6]. However, 
the original NASREM architecture [1-4] does not elaborate on the Planning/Control processes 
alluding to them as to processes of Task Decomposition which is understood as a set of solely inter- 
level activities. On the other hand, although NASREM presumes dealing with a real closed-loop 
systems, it is considered only for a high-resolution level. In general, the control part of NASREM 
is not described in detail; for example, it never makes a difference between feed-forward and 
feedback parts of the systems functioning. Finally, it is usually overlooked that NASREM 
methodology can (and should) be recursively applied to each component of its own structure. 

1.2 Organization of paper 

This paper focuses on the following five issues: 1) a concept of intelligent machine based upon the 
idea of intelligence , 2) general foundations of knowledge representation in intelligent machines, 3) 
peculiarities of the hierarchy of task decomposition, 4) synthesis of task-strings, or controls 
(planning/control processes), and 5) the closed-loop character of operation. This allows for building 
a model of Nested Closed Loop Control Structure based on NASREM-like subsystems of 
Knowledge Representation, Knowledge Acquisition, and Decision Making. We will call it NICS: 
Nested Intelligent Control Structure.NICS structure which can be considered explanatory 
for cognitive processes, and is instrumental for design of Intelligent Control Systems.Section II 
focuses on commonality of metaphors utilized for modeling similar processes in machines and 
living organisms.Techniques of dealing with unstructured and unclear knowledge are addressed in 
Section III.Different types of task decomposition and synthesis of decisions are considered in 
Section IV. Generation of feedforward controls and shaping external feedback loops (real and 
conceptual) are discussed in Section V. Finally, in Section VI the future problems are discussed for 
systems based upon a multiplicity of NICSs. 

2. Intelligent System 

2.1. Intelligence 

Intelligent System is defined here as a system with (which is governed by) Intelligent Controller. 
System is understood as the collection of objects with relationships among these objects. This 
definition is a recursive one and can be applied to any object within the system which can be 
considered a system by itself.Thus a mechanism of generalization is implied which allows for 
neglecting inner mechanics of the system if it is considered an object. In other words, a 
multiresolutional technique of analysis is applied which entails multiresolutional knowledge 
acquisition and representation, multiresolutional decision making, etc. 

Intelligent System is postulated to be governed as to achieve the goal. Goal can be assigned 
externally, or generated within the system. The device for governing the Intelligent System will be 




363 



called an Intelligent Controller. Control is understood as a subsystem, (its activities, and/or its 
output) which serves for generating plans, tasks, subtasks, commands (actually, control plans, 
control tasks, control subtasks, control commands) which are required to provide for a proper 
functioning of the system . This subsystem can be distributed over the whole intelligent machine 
under consideration. Intelligent Control is defined as Control which is generated by a controller 
with a property of Intelligence. Definitions of Intelligence can be found in [59]. 

Robotic Intelligence (or Machine Intelligence 3 ) can be considered a property of of the robot’s 
computer controller, or a computer controller of an Intelligent Machine to perforin assignments 
in an uncertain environment (level 1) with no human involvement, and to 
independently develop new assignment for subsequent activities (level 2). The term 
“uncertain” should be understood in a sense: not accounted for, not predicted by the assignment 
(level 1), or by the stored knowledge as well as by an assignment (level 2). Clearly, both of the 
levels of intelligence are associated with certain degrees of perception, knowledge organization, and 
decision making (see Figure 1 ). The structure is divided in two devices: intelligent controller 
computer and system to be controlled. Later we will see that this division is equivalent to dividing 
the whole world into two closely related Worlds: WORLD OF REFLECTION existing only in 
IC (brain’s) imagination, and WORLD OF REALITY which exists as a hardware set.( It is 
interesting that these two worlds are mirror reflection of each other). 

The following definition of intelligence will be accepted for the subsequent presentation: 
intelligence is the ability to cope successfully with specified as well as 
nonspecified problems. In the area of NC-machines most of the circumstances are presumed to 
be specified. In the area of robotics when the goal of operation is assigned something is always not 
specified. If the Problem is scrupulously specified, then the environment might be not completely 
specified. If both, problem and environment are specified, it always so happen that part of the 
specifications is at a level of generality which is leaving many details not specified, not accounted 
for. 

2.2 Issues of Interest 

Assignment is understood as a set of outputs of the goal generator (external and/or internal). 
External environment is a part of the “world” which can be controlled by the Intelligent Controller 
(IC) only via “actions” and does not received commands from IC directly. On the other hand, these 
actions can be considered “commands”, and the external environment is becoming a part of the 
control model. Actions imply change which generates a need in introducing an idea of “state” and 
probably, a “state space” for describing the system and its environment as a string of the 
“snapshots”. Then the system is being controlled by IC with a goal as its assignment and inputs 
for the system that the IC is destined to generate. Then this is a system for actuating the control 
profiles generated by the IC and affecting the environment by “actions” produced by its actuators. 
This individual intelligent machine contains subsystems which in turn are divided in parts down to 
the primitive which cannot be divided anymore. On the other hand a particular intelligent machine 



3 Discussion of differences between human and machine intelligence is out of 
the scope of this paper. Briefly this issue is addressed in [59]. 




364 




Figure 1. General Structure of the Intelligent Controller (or Intelligent Machine with a Controller). 










365 



can be a part of the team of similar intelligent machines. All teams constitute a “population”, and can 
be considered a specie. 

A multiresolutional system can be introduced consisting of Specie(s)-Group(s)-Individual(s)- 
Organ(s)-Cell(s) hierarchy (SGIOC) which is driven by a corresponding hierarchy of goals . More 
technologically it can be formulated as a hierarchy of “kind of intelligent machines-teams of 
intelligent machines-intelligent machine-subsystem (sub-assembly) of an intelligent machine- 
components-primitive units”. We probably will need to talk about IC for all levels of SGIOC- 
hierarchy (global IC-center of a specie, team-IC, IC of an individual intelligent machine, decision 
making devices (IC-substitutes) of organs and cells). Clearly, in this context the mechanism of self- 
reproduction can be considered a regular issue for the area of intelligent machines. 

The system shown in Figure 1 should be addressed in more detail. One can expect that the 
system of values plays a distinctive role as a part of knowledge base. We would assume that based 
upon perception of the results of its functioning, an Intelligent Machine develops a system of Values 
(an initial notion of goodness is presumed to be given to an Intelligent Machine at the stage of 
design). This process of values development is demonstrated in Figure 2 (see E-V triangle, where 
E stands for “Experience” and V stands for “Values”).On the other hand, the system of values 
together with Knowledge Base carves out from the base the sub-heterarchy of the Focus of 
Attention (A), and senses a conflict within the A -heterarchy (see the A-C triangle in Figure 2 ). The 
A-C-heterarchy is a good material for decision making activities. Thus, the decision making 
formations (DMF) can be introduced at corresponding levels of the SGIOC hierarchy and this will 
help to put in a proper prospective everything introduced later in the paper (more about SGIOC 
hierarchies see [60-63]). 

One can expect that in SGIOC-hierarchy all levels have their particular language (for 
representation and information exchange). These languages form their hierarchy. Different 
languages might be expected horizontally and vertically. This is a necessary as well as sufficient 
condition because immediately after the language definition one can determine semantics, explain it 
as a set of rules and explain that these rules are to be stored for each level of the SGIOC-hierarchy 
and all storages of all levels together form a hierarchy of knowledge which in fact is a union of 
World Model (Representation) as well as the Intelligent Machine Model (WM+IMM). The following 
is assumed: 

World Representation <==> Knowledge Representation 
(at any time instant) (at any time instant). 

As soon as we consider representation as a time-dependent string of snapshots we can write that 

World Representation^ an interval) <==> Event Representation. 

The following issues should be addressed: a) the structure and the functioning of decision 
making formations (DMFs) which are rather important if not a major part of the intelligent 




366 




Figure 2. Experience-Values-Focus of Attention-Conflict Discovery in the structure of Intelligent 
Controller. 




367 



module (IM), b) means of communication without which IM cannot be shown as a 
component of the SGIOC-hierarchy , c) values generation which are required for proper decision 
making. 

3. Knowledge Organization 

3.1 Multiresolutional Knowledge Processing 

A concept of nested hierarchical ( multiresolutional , pyramidal) information (knowledge) 
processing (MRKP) is becoming increasingly important in the area of intelligent machines 
including robotics, computer vision, and knowledge-based material processing. Multiresolutional 
Knowledge Representation is defined as the union of all monoresolutional representations. The 
main idea of this concept is that the applicable model of a system cannot be built unless this 
system is considered simultaneously at several levels of resolution. Resolution is defined as a 
minimum volume of the state space that is distinguishable within a particular system of 
representation called tessella , and organization (discretization, quantization) of the state space is 
called tesselation if a particular size of tessella is being used efficiently as an element for building 
all descriptions of interest. A concurrent consideration of the system at several resolution levels is 
required, and the redundant representation is justified in which the "same" thing is represented 
several times with different resolution [1], 

An example of the multiresolutional process description can be illustrated by a structure for a 
physical process shown in Figure 3. A definite technological process (say, metal casting, or 
assembling of a mechanical device) can e described with a definite resolution as a 
sequential/parallel network of subprocesses - phenomenological units (Ph t through Ph 5 ). If each 
of these phenomenological units is to be discussed at a higher resolution level (by using “sub- 
phenomenological units) it very well may happen that descriptions at this resolutional level 
“cannot talk to each other”. Either, the levels of resolution do not match, or vocabularies are 
incompatible, or the physical processes under consideration belong to different domains, but the 
consistent model of the overall process cannot be delivered at a high enough level of resolution 
with a due consistency. 

A notion of multiresolutional knowledge representation (MKR) is introduced for a variety 
of systems including data and/or knowledge bases, vision , control, and manufacturing systems, 
industrial automated robots, and (self-programmed) autonomous intelligent machines. Most of 
these applications are actually, or presumably utilizing intelligent modules with decision making 
capabilities, (or human operators performing similar functions). The structure of intelligent 
module is described in [6]. MKR is derived directly from the entity-relational representation of a 
system which is using a number of postulates of representation Some of these postulates are 
establishing a graph representation for the system of interest which includes all levels of resolution 
since it contains not only the systems represented but also the nested system of their components. 
Another postulate presumes that the classes can be recognized among the multiplicity 




368 




Figure 3. Phenomenological decomposition of a process 










369 



of labels, of those commensurable labels, i.e. belonging to the same space of consideration. 
One can see that the structure can be visualized as a set of the interrelated scope graphs LJGj, 

UGjRj j Gj ; i,j=I,II,III, i*j, where Rjj is a relation among the elements of the graphs. Each of 

the scope graphs has a set of vertical (hierarchical) connections of the resolution levels and this set 
of connections is called a hierarchy of the scope. Within each level of resolution an entity-relational 
graph (tessellation) exists which represents all entities and relations among them at a particular 
resolution (tessella). All 

tessellata belong to a particular hierarchy and are being considered together with it: 



Gj= U T ik RTi (|c+i), k=l,...n, i=I, II, III 



ik 

(k is a number of resolution levels). 

Each of the is unifying the set of inclusions for the tessellata 

G i =^[ T ik 3 T i(k-1) => T i(k-2) =>••• => T i(k-n)l 

g g g g 

where the inclusions are meant to represent the relations R. Any tessellatum of the higher 

resolution level can be transformed into the tessellatum of the lower level via mechanism of 
generalization (abstraction). A set of all hierarchies with all tessellata related to each of the 
hierarchies forms a heterostructure (see D-structure in [7]). 

General paradigm of multiresolutional control systems is presented in [40]. Any intelligent 
module transforms (sometimes, irreversibly) the knowledge it deals with, and this transformation 
affects the subsequent computation processes, e.g. those of decision and control. Several types of 
knowledge transformation are reviewed. One of them called knowledge filtering (KF) can be 
characterized by its volume and rate. The detrimental effect of KF can be compensated by the 
corresponding level of knowledge redundancy (and by the subsequent redundancy of decision 
making processes, followed by the action redundancies as well). 

MKR allows for coding the system as a whole and not as a result of selecting only its 
limited subset. This allows for a harmonious control of a system. In [39] an example is described 
of using MRKP system for intelligent control of the OSPREY process in the metallurgy. Another 
system is now in the process of development for a plasma deposition machine. 

A structure of GMKP operates as follows. It is presumed that this sub-object (SO) is a part 
of an object, which in turn, is a part of a particular Domain, which finally, is a part of the World. 

Information concerned with SO (ISO) is obtained through the set of available sensors. The 
Sensor Information Carrier (SIC) delivers ISO to the system for MRKP in a form that contains 
information about the code carried by this particular SIC, and about the modality of this particular 
sensor. The code contains the information of the label and the value, this information should be 




370 



decoded, and the process of inference is performed, after which all information is structured and 
stored. The process of labeling and storing is actually the process of attachment (putting in 
correspondence) of the newly arrived information with the bulk of knowledge previously stored 
and verified. This process makes it knowledge. 

As soon as the modality of sensor information is becoming known, a particular Domain of 
the World Knowledge is being evoked, and the mechanism of interpretation is being prepared 
taking in account the context, and listing the available rules that can be utilized by the system for 
dealing with the decoded information. New rules for interpretation can be obtained which in fact 
can affect the process of interpretation and inference and change the prior (recent) results. SO 
generates all sources of uncertainty: error of measurement (E), uncertainty of incompleteness (I), 
and uncertainty of redundancy (R). New EIR-uncertainties are generated within the code as a result 
of coding and communication; within the interpretation as a result of the EIR-interpretation 
properties, and within the storage as a result of EIR-properties of classification and other tools of 
information organization. All these factors should be taken in account when the degrees of belief 
are being determined. Usually they are generated within the loop of "learning - interpretation - 
storage". 

3.2 Overview of the Situation in the Area of MRKP 

MKR and associated techniques of MRKP was rapidly developing during past two decades 
from three different views: hardware MKR, visual images MKR, and algorithms MKR (with fuzzy 
boundaries). Using effectively multilevel, multilanguage structure of a computer is possible only if 
this multilevel structure is constructed by methods of aggregation (generalization, abstraction) and 
decomposition (instantiation) [8,9]. This area is linked with the problem of partitioning systems in 
order to achieve maximum of efficiency. Proper distribution of resolution among subsystems 
should provide the best utilization of equipment [6,10]. 

Another MKR problem adjacent to the problem of hardware partitioning was the following: 
how to partition something that has not been previously assembled, (e.g. partitioning of a curve) 
[16]. It was determined that the following factors must be taken in account: digitization and/or 
resolution of representation on hand, existence of multiple "views", and the set of attributes 
utilizable for describing the object to be partitioned. Linkage of all these approaches is undeniable 
to the "frame approach" from AI, and aggregation/decomposition methodologies of the earlier 
scientists belonging to the school of thought of General Systems Theory (e.g. see [12]). A method 
of multiresolutional curve representation is presented in [13]. Pyramid theories of image 
processing and interpretation have been promulgated during the last two decades in a multiplicity of 
well known books and papers by L. Uhr, E. Riseman, A. Hansen, S. Tanimoto, T.Pavlidis, 
M.Levine, R.Bajcsy, P.Burt, A.Rosenfeld [14-19]. Decomposition of entities is determined by the 
focus of attention at the level as illustrated in this sequence: 

level of resolution => context => detail ( tessella) => focus of attention => context 
=>n ext level of resolution 




371 



The well known quadtree structure [20] is not a multiresolutional structure in a sense that 
the accuracy of representation is the same at each level: the highest available accuracy of the level 
with the highest resolution.Recently, there was an attempt to fuzzify the upper levels images when 
the problem of planning was attempted using quadtree as a MKR system [21]. Truly MKR 
approach with using all tessellata for planning was successfully employed in [38]. It turned out that 
the set of hierarchical connections (those of Gj type) forms a "skeleton" that can be used as a good 
enough "syntactic" representation of various complicated shapes [23,24]. This phenomenon seems 
to have explanations within the principles of human perceptions reflected in the biological structure 
of vision system [25]. Multiresolutional representation turned out to be useful also for image 
segmentation and to region matching [26, 27]. 

MRKP is kindred to the fractal methodology of world representation [28]. Multiple-scale 
based approach to image representation and analysis [29] together with fractal-based techniques is 
actually application of the set of ideas characteristic for MKR. Here we are dealing with 
simultaneous representation of all images at all resolutions when the mechanism of generalization 
(or abstraction) is imposed upon the system by an external mathematical model. The last group of 
MRKP results is related to the multiresolutional algorithms. Somewhat interlaced with the fractal 
methodology are the algorithms of continued fractions [30,31]. Multiresolutional relaxation 
algorithms have been recommended for efficient dealing with texture [32]. A consistent and 
complete overview of the multigrid relaxation algorithms for image processing can be found in 
[33]. More detailed description of operations is given in [37]. 

3.3 Dealing with Unstructured Knowledge 

One of the key problems in the area if intelligent control is dealing with the unstructured 
and/or hard to represent media. Even if the process variables are available, the model of this 
process is typically incomplete and fragmentary one. The output of the control process cannot be 
directly monitored, it allows only for a post-factum measurement and off-line learning. A research 
effort is undertaken at Drexel University attempting to provide the following contribution to the 
theory of intelligent control: an advanced architecture of the controller is developed based upon a 
new principle of knowledge inverse 4 with an original combination of feedforward and feedback 
control principles using a multiresolutional state representation. 

A new method of knowledge representation and modeling has been developed for the 
processes with incomplete and/or inadequate information support in which consistency of the 
model is being achieved via generation of the visual, acoustic, and thermal images blended with 
data from the conventional set of sensors. A computer vision system is being developed which is 
supposed to submit the lacking information for the consistent model of the process; this system is 
based on a novel principle of multiresolutional image processing in which conventional procedures 
of computer vision are replaced by a set of algorithms specifically proposed for dealing with 
unstructured and/or hard to represent media; a multiresolutional learning systems is being 

4 Knowledge Inverse Operator in knowledge based controllers is similar to the 
systems inverse in a classical theory of linear systems. 




372 



developed with communication between the adjacent levels of resolution with the help of a 
hierarchical network of of long-term learning and short-term adaptive channels which provide 
constant adjustment of knowledge inverse operator as well as compensator for parameters deviation 
[39, 57, 58]. 

Proper conditions should be satisfied for organization and processing of redundant 
information (knowledge) in the multiresolutional systems. Providing a sufficient degree of 
redundancy is one of these conditions. A definite set of rules of incorporating the redundant 
information (knowledge) must be applied for the system proper functioning. The significance of 
theoretically explainable techniques of dealing with redundancy of information (knowledge) is 
often overlooked. Several operators are discussed in [6] implicitly using redundancy of information 
(knowledge): generalization (abstraction), focusing of attention, etc. Phenomena of 
multiresolutional redundant perceptual organization are linked with the phenomena of error 
propagation (see [36]). 

3.4 Uncertainty Generation and Propagation in MKR and MRKP 

Multiresolutional system of Knowledge Representation (MKR) and processing (MRKP) is based 
upon postulates formulated in [39]. All of these postulates establish representation as a body which 
must be uncertain. Indeed, after the alternatives of the future decision are constructed (whether in 
the problems of design, or in the planning/control problems) these alternatives are to be compared. 
Consistent comparison can be done only if the judgment is developed about the uncertainty of the 
evaluation of our alternatives. The set of alternatives with a definite probability of occurrence, is 
obtained presumably by combinatorial methods discussed in ATG area [34,35]. 

The 6 famous Kolmogorov's axioms [41] are a mechanism for making judgment on the 
alternatives of decision.We will question the validity of the Kolmogorov's Axioms of 5 and 6 for 

the case of MKR. Indeed, the condition 11^=0 means that the events (sets) under consideration 

are incompatible. However, the infinite inclusion Aj dA 2 3 ... 3 A v 3 ... does not require 
necessarily that lim n(A n )=0, if n—»°°. Everything depends on interpretation of inclusion. This 

becomes especially important when the process of consecutive generalization is considered. 

We are addressing these questions by offering a general approach for dealing with processes 
of error propagation in the system, and by recommending measures of its reduction. We show that 
no judgment of error can be made before the system is organized as a hierarchy of generalizations. 
Thus, the hierarchy of resolution conscious information should be sought for from the SO. Then, 
the Code which arrives should be considered a generalized code which allows for nested 
hierarchical treatment (recursive interpretation, and/or consecutive refinement). 

Now, the storage is becoming a multiresolutional system, and the whole right side of the 
structure is being adjusted with methodology of [6]: the source of knowledge is being treated as a 
multiresolutional structure, rules constitute a hierarchy of classes and a hierarchy of rules within the 
class, finally, the processes of learning are done consecutively with gradual involvement of each 
consecutive tessellatum. Then the following conceptual structure is required to support the 




373 



MRKP system in the view of dealing with processes of error generation and its reduction. The 
whole processing is considered as a multiresolutional system of consecutive encoding/decoding 
procedures. In a number of cases a hierarchy of sensors can be expected that makes the encoding 
subsystem working with a multiplicity of inputs to all levels. 

4. Mechanisms of Decision Making: Task Generation 

4.1 Synthesis of Decisions, and Decision Making 

The structure of the Intelligent Controller can be visualized now as shown in Figure 4. It repeats the 
diagram shown in Figures 1 and 2 with the difference that multiresolutional organization of 
subsystems of Perception, Knowledge Base, and Decision Making (Planning/Control) is 
demonstrated explicitly. (One can see that we consider planning/control continuum to be the result of 
the operation of the subsystem of decision making.Plan is defined as a set of generalized activities 
to be performed to achieve a particular goal.Plan can be refined and the activities to be performed 
can be described in more detail. This process of refinement can be repeated many times until for the 
whole plan (or for the part of it which is well supported by available information) description of 
activities required can be made as precise as necessary. This precise assignment is considered to be 
a reference trajectory, and it allows for using it in the execution controller (see [6] and Figure 5 ). 
Structure from Figure 4 implies using three information heterarchies: Hq, H$, and Ffp which 
denote heterarchies of goal 5 , current state and task correspondingly. 

Then the couple (Hq, Hg) constitute the problem to be solved and is a solution for the 
problem. The planning/control algorithm (P/C) is supposed to produce a transformation 

P/C:(H G ,H S )-->H T (1) 

The hierarchy of task decomposition is never given: it should be generated by the controller. How? 
Yes, the plan (a set of tasks at a level) forms a solution for a problem formulated as a task of the 
upper level. But where all these solutions are coming from? In other words, what is the mechanism 
of (1) that can be recommended throughout the NICS hierarchy? 

The core of operation of the intelligent module in its part that is dealing with the goal 
heterarchy, does not differ from the process of the automated theory generation (ATG). The latter 
was first tackled in [34] and then was furtherly developed in [35] and other works. It is important 
to emphasize that any process of representation is based upon theory generation . Like in ATG, the 
subsystem of representation is supposed to synthesize a consistent system of tessellata constructed 
at different resolutions and transformable one into another. As it was mentioned in [40], the core of 
MRKP operations does not differ from the process of the automated theory generation (ATG) 
[34,35]. Since the mechanisms of generalization are involved, then any process of representation 
is based upon theory generation . 



5 Goal is commonly understood as the state to be achieved. Thus H^and Hg 
are both state descriptions: the desirable one, and the current one. 




374 




PERCEPTS-CONCEPTS CONCEPTS-CQNCEPTS CONCEPTS-URRIRNTS 



Nested hierarchy 
of the perceived world 



tli 



Nested hierarchy 
of the content 



MihhurTiUT- 



Nested hierarchical 
planning/con trol 
teem of decision-makers 





Figure 4. Multiresolutional Intelligent Controller 








375 



Existence of vocabularies for tasks at all levels is tacitly presumed. In the meantime, these 
vocabularies usually exist only partially. The need for a mechanism that is synthesizing the new 
words for the vocabularies of tasks was understood by J. Albus long ago and presented as a list- 
processing mechanism , or cerebellar controller [49-52]. Many researchers are using CMAC for 
learning purposes, however it remains what it was meant to be from the beginning: a conceptron 
which performs combinatorial synthesis on a set with particular properties. 

Supplemented by CMAC (we will denote it C), expression (1) will be changed as follows: 

P/C:(H G ,H S )->DM[C(H G ,H S )->{H T )]->U [H T ] max (2) 

X 

where DM is an operator of decision making applied to the result of the 

combinatorial synthesis (see[54]), 

(Hj)-is a set of alternatives obtained at the output of CMAC, 

[H^max -the result of choice, 

x-index of the hierarchy for which maximum solution is found, i.e.”m” 
for the main hierarchy of decomposition, or “1,2,...” (where 1-a 
hierarchy of task decomposition which starts of the level under 
consideration and remains within this level, 2-a hierarchy of task 
decomposition for another upper level goal intersecting with the main 
hierarchy, 3-a hierarchy of task decomposition for one of the cooperative 
processes, etc.). 

A single P/C at a level can be considered a Markov controller (e.g. as in [53]). For the 
hierarchies this process is described in [54].It consists both of the C-operator for synthesis of the 
alternatives, and consecutive DM-operator for choice. C-operator has many incarnations in 
contemporary literature including CMAC [51, 52], Markov Controller [53], Conceptron [54], 
different neural-network based solutions, etc. Early GPS [55], and more recent SOAR [56] can be 
considered examples of possible C-operators for the upper (linguistic) levels of the NICS hierarchy 
of Planning/Control. Servo-levels often have C-operator too if search, or dynamic programming is 
applied in the Controller. 

Practical conclusions from (2) can be listed as follows: 

- Since each level of the multiresolutional system of representation has within-level 
(horizontal) relationships as well as branching new hierarchies and intersections, it would be 
improper to talk about hierarchies of representation, or hierarchies of task decomposition. Clearly 
in all of these cases we are dealing with heterarchical representations. 

-Three separate heterarchies should be maintained for supporting the P/C 
operations:HQ, Hg, and H^p The first two of them Hq, Hg are the state descriptions and can be 
associated with various aspects of the World Model from [6]. Both heterarchies can be maintained 
using the same system of representation. The third heterarchy H T can be interpreted as transition 
function of the corresponding automata heterarchy. This heterarchy is usually expected to use a 
different representation system. 




376 



- It can be recommended to utilize a uniform approach to building vocabularies, 
grammars, and axioms for all levels of the multiresolutional heterarchies. Every level is an 
object/task level . At the lower levels (execution of commands) where analytical representation is 
very common (traditionally, and because of abundance of typical software packages), it is very easy 
to switch from analytical to automata representation, and then from automata representation to look- 
up tables which makes these levels uniform with the higher (linguistic) levels. This approach has 
been implemented for intelligent material processing (see [57, 58]). 

4.2 Two Types of Task Decomposition 

Task (command) is defined in [1] as an instruction 

DO^Task>AFTER<Start Event>UNTIL<Goal Event>, 
or 

TASK COMMANDS DO <TASK> 

WHEN (START EVENT) 

DO (TASK) 

UNTIL (GOAL EVENT) 

END-DO 

One can see that tasks are controls. The statement DO<Task> is supposed to be understood by 
the performer: either by the lower resolution level which will DO the next decomposition of the 
TASK, or by the execution actuator which transforms the label of TASK into ACTION. In all of 
these cases the strings of tasks can be visualized as control laws. Control command can be applied 
if the state is known (“start event”), and when the final state is determined for this particular control 
command (“goal event”). So, it is clear that the hierarchy of TASK DECOMPOSITION (Hj) can be 
generated by the PLANNING/CONTROL SYSTEM if two additional hierarchies are given: a 
hierarchy of CURRENT STATE DECOMPOSITION (H s ) and a hierarchy of FINAL STATE, or 
GOAL DECOMPOSITION (H G ). 

A set of n consecutive tasks (subtasks) for the i-th level represents a solution for the task 
Tj_i of the upper (i-l)-th level 



T i-r‘ >p i" > {TiGo* t l) ;T i( t l» t 2) ; -‘» T i( t n» Wl^-^iGf-l* l f)} , 1^ n < f , (3) 

is called PLAN* , or P- to be performed at the i-th level when the task T-_ ^ should be executed at the 
(i-l)-th level of the hierarchy of task decomposition .The task of the (i-l)-th level under 
consideration Tj. ^ can generally be decomposed into a set of m parallel plans 

T i-i l f); Pim^- l i)) (4). 



8 Thus, plan can be defined as follows: plan is a combination of tasks at the i-th 
level which are determined as a solution for the problem formulated as a task at the 
(i-l)-th level of the multiresolutional hierarchy of a system. 




